Skip to content

Instantly share code, notes, and snippets.

@Vidhi1290
Created September 16, 2022 14:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Vidhi1290/c9a6046f079fd5abafb7583d3689a410 to your computer and use it in GitHub Desktop.
Save Vidhi1290/c9a6046f079fd5abafb7583d3689a410 to your computer and use it in GitHub Desktop.
RandomForest.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/Vidhi1290/c9a6046f079fd5abafb7583d3689a410/randomforest.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc": true,
"id": "Tv7Ijtj8xhC4"
},
"source": [
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Problem\" data-toc-modified-id=\"Problem-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Problem</a></span><ul class=\"toc-item\"><li><span><a href=\"#What-is-a-Backorder?\" data-toc-modified-id=\"What-is-a-Backorder?-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>What is a Backorder?</a></span></li><li><span><a href=\"#Data-description\" data-toc-modified-id=\"Data-description-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Data description</a></span></li></ul></li><li><span><a href=\"#Loading-the-required-libraries\" data-toc-modified-id=\"Loading-the-required-libraries-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Loading the required libraries</a></span></li><li><span><a href=\"#Identify-Right-Error-Metrics\" data-toc-modified-id=\"Identify-Right-Error-Metrics-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Identify Right Error Metrics</a></span><ul class=\"toc-item\"><li><span><a href=\"#Confusion-Matrix\" data-toc-modified-id=\"Confusion-Matrix-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Confusion Matrix</a></span></li><li><span><a href=\"#Function-to-calculate-required-metrics\" data-toc-modified-id=\"Function-to-calculate-required-metrics-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Function to calculate required metrics</a></span></li></ul></li><li><span><a href=\"#Loading-the-data\" data-toc-modified-id=\"Loading-the-data-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Loading the data</a></span></li><li><span><a href=\"#Understand-the-Data---Exploratory-Data-Analysis-(EDA)\" data-toc-modified-id=\"Understand-the-Data---Exploratory-Data-Analysis-(EDA)-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Understand the Data - Exploratory Data Analysis (EDA)</a></span><ul class=\"toc-item\"><li><span><a href=\"#Number-row-and-columns\" data-toc-modified-id=\"Number-row-and-columns-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Number row and columns</a></span></li><li><span><a href=\"#First-and-last-5-rows\" data-toc-modified-id=\"First-and-last-5-rows-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>First and last 5 rows</a></span></li><li><span><a href=\"#Statistic-summary\" data-toc-modified-id=\"Statistic-summary-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>Statistic summary</a></span></li><li><span><a href=\"#Data-type\" data-toc-modified-id=\"Data-type-5.4\"><span class=\"toc-item-num\">5.4&nbsp;&nbsp;</span>Data type</a></span></li></ul></li><li><span><a href=\"#Data-pre-processing\" data-toc-modified-id=\"Data-pre-processing-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Data pre-processing</a></span><ul class=\"toc-item\"><li><span><a href=\"#Convert-all-the-attributes-to-appropriate-type\" data-toc-modified-id=\"Convert-all-the-attributes-to-appropriate-type-6.1\"><span class=\"toc-item-num\">6.1&nbsp;&nbsp;</span>Convert all the attributes to appropriate type</a></span><ul class=\"toc-item\"><li><span><a href=\"#Re-display-data-type-of-each-variable\" data-toc-modified-id=\"Re-display-data-type-of-each-variable-6.1.1\"><span class=\"toc-item-num\">6.1.1&nbsp;&nbsp;</span>Re-display data type of each variable</a></span></li><li><span><a href=\"#Statistic-summary\" data-toc-modified-id=\"Statistic-summary-6.1.2\"><span class=\"toc-item-num\">6.1.2&nbsp;&nbsp;</span>Statistic summary</a></span></li></ul></li><li><span><a href=\"#Delete-sku-attribute\" data-toc-modified-id=\"Delete-sku-attribute-6.2\"><span class=\"toc-item-num\">6.2&nbsp;&nbsp;</span>Delete sku attribute</a></span></li><li><span><a href=\"#Missing-Data\" data-toc-modified-id=\"Missing-Data-6.3\"><span class=\"toc-item-num\">6.3&nbsp;&nbsp;</span>Missing Data</a></span></li><li><span><a href=\"#Train-and-test-split\" data-toc-modified-id=\"Train-and-test-split-6.4\"><span class=\"toc-item-num\">6.4&nbsp;&nbsp;</span>Train and test split</a></span><ul class=\"toc-item\"><li><span><a href=\"#Target-attribute-distribution\" data-toc-modified-id=\"Target-attribute-distribution-6.4.1\"><span class=\"toc-item-num\">6.4.1&nbsp;&nbsp;</span>Target attribute distribution</a></span></li><li><span><a href=\"#Split-the-data-into-train-and-test\" data-toc-modified-id=\"Split-the-data-into-train-and-test-6.4.2\"><span class=\"toc-item-num\">6.4.2&nbsp;&nbsp;</span>Split the data into train and test</a></span></li><li><span><a href=\"#Target-attribute-distribution-after-the-split\" data-toc-modified-id=\"Target-attribute-distribution-after-the-split-6.4.3\"><span class=\"toc-item-num\">6.4.3&nbsp;&nbsp;</span>Target attribute distribution after the split</a></span></li></ul></li><li><span><a href=\"#Convert-categorical-target-attribute-to-numeric\" data-toc-modified-id=\"Convert-categorical-target-attribute-to-numeric-6.5\"><span class=\"toc-item-num\">6.5&nbsp;&nbsp;</span>Convert categorical target attribute to numeric</a></span><ul class=\"toc-item\"><li><span><a href=\"#Target-attribute-distribution\" data-toc-modified-id=\"Target-attribute-distribution-6.5.1\"><span class=\"toc-item-num\">6.5.1&nbsp;&nbsp;</span>Target attribute distribution</a></span></li></ul></li><li><span><a href=\"#Checking-the-data-types\" data-toc-modified-id=\"Checking-the-data-types-6.6\"><span class=\"toc-item-num\">6.6&nbsp;&nbsp;</span>Checking the data types</a></span></li><li><span><a href=\"#Standardize-the-numerical-attributes\" data-toc-modified-id=\"Standardize-the-numerical-attributes-6.7\"><span class=\"toc-item-num\">6.7&nbsp;&nbsp;</span>Standardize the numerical attributes</a></span><ul class=\"toc-item\"><li><span><a href=\"#Store-numerical-attributes-name\" data-toc-modified-id=\"Store-numerical-attributes-name-6.7.1\"><span class=\"toc-item-num\">6.7.1&nbsp;&nbsp;</span>Store numerical attributes name</a></span></li><li><span><a href=\"#Using-StandardScaler,-standardize-the-numerical-attributes\" data-toc-modified-id=\"Using-StandardScaler,-standardize-the-numerical-attributes-6.7.2\"><span class=\"toc-item-num\">6.7.2&nbsp;&nbsp;</span>Using StandardScaler, standardize the numerical attributes</a></span></li></ul></li><li><span><a href=\"#Converting-Categorical-attributes-to-Numeric-attributes\" data-toc-modified-id=\"Converting-Categorical-attributes-to-Numeric-attributes-6.8\"><span class=\"toc-item-num\">6.8&nbsp;&nbsp;</span>Converting Categorical attributes to Numeric attributes</a></span><ul class=\"toc-item\"><li><span><a href=\"#Store-categorical-attributes-name\" data-toc-modified-id=\"Store-categorical-attributes-name-6.8.1\"><span class=\"toc-item-num\">6.8.1&nbsp;&nbsp;</span>Store categorical attributes name</a></span></li><li><span><a href=\"#Using-OneHotEncoder,--converting-Categorical-attributes-to-Numeric-attributes\" data-toc-modified-id=\"Using-OneHotEncoder,--converting-Categorical-attributes-to-Numeric-attributes-6.8.2\"><span class=\"toc-item-num\">6.8.2&nbsp;&nbsp;</span>Using OneHotEncoder, converting Categorical attributes to Numeric attributes</a></span></li></ul></li><li><span><a href=\"#Concatenate-attribute\" data-toc-modified-id=\"Concatenate-attribute-6.9\"><span class=\"toc-item-num\">6.9&nbsp;&nbsp;</span>Concatenate attribute</a></span></li></ul></li><li><span><a href=\"#Model-building\" data-toc-modified-id=\"Model-building-7\"><span class=\"toc-item-num\">7&nbsp;&nbsp;</span>Model building</a></span><ul class=\"toc-item\"><li><span><a href=\"#RandomForestClassifier-Model\" data-toc-modified-id=\"RandomForestClassifier-Model-7.1\"><span class=\"toc-item-num\">7.1&nbsp;&nbsp;</span>RandomForestClassifier Model</a></span><ul class=\"toc-item\"><li><span><a href=\"#Instantiate-Model\" data-toc-modified-id=\"Instantiate-Model-7.1.1\"><span class=\"toc-item-num\">7.1.1&nbsp;&nbsp;</span>Instantiate Model</a></span></li><li><span><a href=\"#Train-Model\" data-toc-modified-id=\"Train-Model-7.1.2\"><span class=\"toc-item-num\">7.1.2&nbsp;&nbsp;</span>Train Model</a></span></li><li><span><a href=\"#List-important-features\" data-toc-modified-id=\"List-important-features-7.1.3\"><span class=\"toc-item-num\">7.1.3&nbsp;&nbsp;</span>List important features</a></span></li><li><span><a href=\"#Predict\" data-toc-modified-id=\"Predict-7.1.4\"><span class=\"toc-item-num\">7.1.4&nbsp;&nbsp;</span>Predict</a></span></li><li><span><a href=\"#Evaluate\" data-toc-modified-id=\"Evaluate-7.1.5\"><span class=\"toc-item-num\">7.1.5&nbsp;&nbsp;</span>Evaluate</a></span></li></ul></li><li><span><a href=\"#Up-sampling\" data-toc-modified-id=\"Up-sampling-7.2\"><span class=\"toc-item-num\">7.2&nbsp;&nbsp;</span>Up-sampling</a></span><ul class=\"toc-item\"><li><span><a href=\"#Instantiate-SMOTE\" data-toc-modified-id=\"Instantiate-SMOTE-7.2.1\"><span class=\"toc-item-num\">7.2.1&nbsp;&nbsp;</span>Instantiate SMOTE</a></span></li><li><span><a href=\"#Fit-Sample\" data-toc-modified-id=\"Fit-Sample-7.2.2\"><span class=\"toc-item-num\">7.2.2&nbsp;&nbsp;</span>Fit Sample</a></span></li></ul></li><li><span><a href=\"#RandomForestClassifier-with-up-sample-data\" data-toc-modified-id=\"RandomForestClassifier-with-up-sample-data-7.3\"><span class=\"toc-item-num\">7.3&nbsp;&nbsp;</span>RandomForestClassifier with up-sample data</a></span><ul class=\"toc-item\"><li><span><a href=\"#Instantiate-Model\" data-toc-modified-id=\"Instantiate-Model-7.3.1\"><span class=\"toc-item-num\">7.3.1&nbsp;&nbsp;</span>Instantiate Model</a></span></li><li><span><a href=\"#Train-the-model\" data-toc-modified-id=\"Train-the-model-7.3.2\"><span class=\"toc-item-num\">7.3.2&nbsp;&nbsp;</span>Train the model</a></span></li><li><span><a href=\"#List-important-features\" data-toc-modified-id=\"List-important-features-7.3.3\"><span class=\"toc-item-num\">7.3.3&nbsp;&nbsp;</span>List important features</a></span></li><li><span><a href=\"#Predict\" data-toc-modified-id=\"Predict-7.3.4\"><span class=\"toc-item-num\">7.3.4&nbsp;&nbsp;</span>Predict</a></span></li><li><span><a href=\"#Evaluate\" data-toc-modified-id=\"Evaluate-7.3.5\"><span class=\"toc-item-num\">7.3.5&nbsp;&nbsp;</span>Evaluate</a></span></li></ul></li><li><span><a href=\"#Hyper-parameter-tuning-using-Grid-Search-and-Cross-Validation\" data-toc-modified-id=\"Hyper-parameter-tuning-using-Grid-Search-and-Cross-Validation-7.4\"><span class=\"toc-item-num\">7.4&nbsp;&nbsp;</span>Hyper-parameter tuning using Grid Search and Cross Validation</a></span><ul class=\"toc-item\"><li><span><a href=\"#Parameters-to-test\" data-toc-modified-id=\"Parameters-to-test-7.4.1\"><span class=\"toc-item-num\">7.4.1&nbsp;&nbsp;</span>Parameters to test</a></span></li><li><span><a href=\"#Instantiate-Decision-Tree\" data-toc-modified-id=\"Instantiate-Decision-Tree-7.4.2\"><span class=\"toc-item-num\">7.4.2&nbsp;&nbsp;</span>Instantiate Decision Tree</a></span></li><li><span><a href=\"#Instantiate-GridSearchCV\" data-toc-modified-id=\"Instantiate-GridSearchCV-7.4.3\"><span class=\"toc-item-num\">7.4.3&nbsp;&nbsp;</span>Instantiate GridSearchCV</a></span></li><li><span><a href=\"#Train-RandomForest-using-GridSearchCV\" data-toc-modified-id=\"Train-RandomForest-using-GridSearchCV-7.4.4\"><span class=\"toc-item-num\">7.4.4&nbsp;&nbsp;</span>Train DT using GridSearchCV</a></span></li><li><span><a href=\"#Best-Params\" data-toc-modified-id=\"Best-Params-7.4.5\"><span class=\"toc-item-num\">7.4.5&nbsp;&nbsp;</span>Best Params</a></span></li><li><span><a href=\"#Predict\" data-toc-modified-id=\"Predict-7.4.6\"><span class=\"toc-item-num\">7.4.6&nbsp;&nbsp;</span>Predict</a></span></li><li><span><a href=\"#Evaluate\" data-toc-modified-id=\"Evaluate-7.4.7\"><span class=\"toc-item-num\">7.4.7&nbsp;&nbsp;</span>Evaluate</a></span></li></ul></li><li><span><a href=\"#Building-RandomForestClassifier-Model-using-Variable-Importance\" data-toc-modified-id=\"Building-RandomForestClassifier-Model-using-Variable-Importance-7.5\"><span class=\"toc-item-num\">7.5&nbsp;&nbsp;</span>Building RandomForestClassifier Model using Variable Importance</a></span><ul class=\"toc-item\"><li><span><a href=\"#Instantiate-Model\" data-toc-modified-id=\"Instantiate-Model-7.5.1\"><span class=\"toc-item-num\">7.5.1&nbsp;&nbsp;</span>Instantiate Model</a></span></li><li><span><a href=\"#Train-the-model\" data-toc-modified-id=\"Train-the-model-7.5.2\"><span class=\"toc-item-num\">7.5.2&nbsp;&nbsp;</span>Train the model</a></span></li><li><span><a href=\"#Predict\" data-toc-modified-id=\"Predict-7.5.3\"><span class=\"toc-item-num\">7.5.3&nbsp;&nbsp;</span>Predict</a></span></li><li><span><a href=\"#Evaluate\" data-toc-modified-id=\"Evaluate-7.5.4\"><span class=\"toc-item-num\">7.5.4&nbsp;&nbsp;</span>Evaluate</a></span></li></ul></li></ul></li></ul></div>"
]
},
{
"cell_type": "code",
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2oybzatEx73S",
"outputId": "ec3191b4-c5b0-4cbf-e6ff-7a73c4805d64"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /content/drive\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zYbYeE78xhC7"
},
"source": [
"# Problem\n",
"\n",
"Is to identify products at risk of backorder before the event occurs so that business has time to react. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YiwpsyI3xhC7"
},
"source": [
"## What is a Backorder?\n",
"Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. \n",
"A backorder generally indicates that customer demand for a product or service exceeds a company’s capacity to supply it. Back orders are both good and bad. Strong demand can drive back orders, but so can suboptimal planning. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jhTPJdI3xhC7"
},
"source": [
"## Data description\n",
"\n",
"Data file contains the historical data for the 8 weeks prior to the week we are trying to predict. The data was taken as weekly snapshots at the start of each week. Columns are defined as follows:\n",
"\n",
" sku - Random ID for the product\n",
"\n",
" national_inv - Current inventory level for the part\n",
"\n",
" lead_time - Transit time for product (if available)\n",
"\n",
" in_transit_qty - Amount of product in transit from source\n",
"\n",
" forecast_3_month - Forecast sales for the next 3 months\n",
"\n",
" forecast_6_month - Forecast sales for the next 6 months\n",
"\n",
" forecast_9_month - Forecast sales for the next 9 months\n",
"\n",
" sales_1_month - Sales quantity for the prior 1 month time period\n",
"\n",
" sales_3_month - Sales quantity for the prior 3 month time period\n",
"\n",
" sales_6_month - Sales quantity for the prior 6 month time period\n",
"\n",
" sales_9_month - Sales quantity for the prior 9 month time period\n",
"\n",
" min_bank - Minimum recommend amount to stock\n",
"\n",
" potential_issue - Source issue for part identified\n",
"\n",
" pieces_past_due - Parts overdue from source\n",
"\n",
" perf_6_month_avg - Source performance for prior 6 month period\n",
"\n",
" perf_12_month_avg - Source performance for prior 12 month period\n",
"\n",
" local_bo_qty - Amount of stock orders overdue\n",
"\n",
" deck_risk - Part risk flag\n",
"\n",
" oe_constraint - Part risk flag\n",
"\n",
" ppap_risk - Part risk flag\n",
"\n",
" stop_auto_buy - Part risk flag\n",
"\n",
" rev_stop - Part risk flag\n",
"\n",
" went_on_backorder - Product actually went on backorder. This is the target value.\n",
" \n",
" Yes or 1 : Product backordered\n",
"\n",
" No or 0 : Product not backordered"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YcT34n5_xhC8"
},
"source": [
"# Loading the required libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true,
"id": "_vbbMyoDxhC8"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler\n",
"\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"from imblearn.over_sampling import SMOTE\n",
"\n",
"from sklearn.model_selection import GridSearchCV\n",
"\n",
"from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, roc_curve, auc\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_VGhfN2YxhC9"
},
"source": [
"# Identify Right Error Metrics\n",
"\n",
" Based on the business have to identify the right error metrics."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_aV_fIsnxhC-"
},
"source": [
"## Function to calculate required metrics"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "SqPyGdkQxhC-"
},
"outputs": [],
"source": [
"def evaluate_model(act, pred):\n",
" print(\"Confusion Matrix \\n\", confusion_matrix(act, pred))\n",
" print(\"Accurcay : \", accuracy_score(act, pred))\n",
" print(\"Recall : \", recall_score(act, pred))\n",
" print(\"Precision: \", precision_score(act, pred)) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kIp0b3B-xhC-"
},
"source": [
"# Loading the data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "331XYJT9xhC-"
},
"outputs": [],
"source": [
"data = pd.read_csv(\"/content/drive/MyDrive/Dataset-NMIMS/BackOrders.csv\", header=0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VfKAbz4BxhC-"
},
"source": [
"# Understand the Data - Exploratory Data Analysis (EDA)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aTDLWAulxhC-"
},
"source": [
"## Number row and columns"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EcXfaEYYxhC_",
"outputId": "3b5e7206-8427-488f-8ca8-03e33e4866cc"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(61589, 23)"
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wWHBZOidxhC_"
},
"source": [
"## First and last 5 rows"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"id": "jUQj_QbvxhC_",
"outputId": "ab090305-e96d-4490-b9bf-2b43f7145b5a"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" sku national_inv lead_time in_transit_qty forecast_3_month \\\n",
"0 1888279 117 NaN 0 0 \n",
"1 1870557 7 2.0 0 0 \n",
"2 1475481 258 15.0 10 10 \n",
"3 1758220 46 2.0 0 0 \n",
"4 1360312 2 2.0 0 4 \n",
"\n",
" forecast_6_month forecast_9_month sales_1_month sales_3_month \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 77 184 46 132 \n",
"3 0 0 1 2 \n",
"4 6 10 2 2 \n",
"\n",
" sales_6_month ... pieces_past_due perf_6_month_avg perf_12_month_avg \\\n",
"0 15 ... 0 -99.00 -99.00 \n",
"1 0 ... 0 0.50 0.28 \n",
"2 256 ... 0 0.54 0.70 \n",
"3 6 ... 0 0.75 0.90 \n",
"4 5 ... 0 0.97 0.92 \n",
"\n",
" local_bo_qty deck_risk oe_constraint ppap_risk stop_auto_buy rev_stop \\\n",
"0 0 No No Yes Yes No \n",
"1 0 Yes No No Yes No \n",
"2 0 No No No Yes No \n",
"3 0 Yes No No Yes No \n",
"4 0 No No No Yes No \n",
"\n",
" went_on_backorder \n",
"0 No \n",
"1 No \n",
"2 No \n",
"3 No \n",
"4 No \n",
"\n",
"[5 rows x 23 columns]"
],
"text/html": [
"\n",
" <div id=\"df-217bbe4e-8b71-49a1-a7aa-92b9f2d98f70\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sku</th>\n",
" <th>national_inv</th>\n",
" <th>lead_time</th>\n",
" <th>in_transit_qty</th>\n",
" <th>forecast_3_month</th>\n",
" <th>forecast_6_month</th>\n",
" <th>forecast_9_month</th>\n",
" <th>sales_1_month</th>\n",
" <th>sales_3_month</th>\n",
" <th>sales_6_month</th>\n",
" <th>...</th>\n",
" <th>pieces_past_due</th>\n",
" <th>perf_6_month_avg</th>\n",
" <th>perf_12_month_avg</th>\n",
" <th>local_bo_qty</th>\n",
" <th>deck_risk</th>\n",
" <th>oe_constraint</th>\n",
" <th>ppap_risk</th>\n",
" <th>stop_auto_buy</th>\n",
" <th>rev_stop</th>\n",
" <th>went_on_backorder</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1888279</td>\n",
" <td>117</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>15</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>-99.00</td>\n",
" <td>-99.00</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1870557</td>\n",
" <td>7</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.50</td>\n",
" <td>0.28</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1475481</td>\n",
" <td>258</td>\n",
" <td>15.0</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>77</td>\n",
" <td>184</td>\n",
" <td>46</td>\n",
" <td>132</td>\n",
" <td>256</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.54</td>\n",
" <td>0.70</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1758220</td>\n",
" <td>46</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.75</td>\n",
" <td>0.90</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1360312</td>\n",
" <td>2</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.97</td>\n",
" <td>0.92</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-217bbe4e-8b71-49a1-a7aa-92b9f2d98f70')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-217bbe4e-8b71-49a1-a7aa-92b9f2d98f70 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-217bbe4e-8b71-49a1-a7aa-92b9f2d98f70');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 6
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"id": "FNFShsVNxhC_",
"outputId": "941cee91-589b-49db-9991-c76f0f73b6fe"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" sku national_inv lead_time in_transit_qty forecast_3_month \\\n",
"61584 1397275 6 8.0 0 24 \n",
"61585 3072139 130 2.0 0 40 \n",
"61586 1909363 135 9.0 0 0 \n",
"61587 1845783 63 NaN 0 0 \n",
"61588 1200539 0 2.0 0 8 \n",
"\n",
" forecast_6_month forecast_9_month sales_1_month sales_3_month \\\n",
"61584 24 24 0 7 \n",
"61585 80 140 18 108 \n",
"61586 0 0 10 40 \n",
"61587 0 0 452 1715 \n",
"61588 8 8 0 1 \n",
"\n",
" sales_6_month ... pieces_past_due perf_6_month_avg \\\n",
"61584 9 ... 0 0.98 \n",
"61585 230 ... 0 0.51 \n",
"61586 65 ... 0 1.00 \n",
"61587 3425 ... 0 -99.00 \n",
"61588 1 ... 0 0.79 \n",
"\n",
" perf_12_month_avg local_bo_qty deck_risk oe_constraint ppap_risk \\\n",
"61584 0.98 0 No No No \n",
"61585 0.28 0 No No No \n",
"61586 0.99 0 No No Yes \n",
"61587 -99.00 1 No No No \n",
"61588 0.78 0 Yes No No \n",
"\n",
" stop_auto_buy rev_stop went_on_backorder \n",
"61584 Yes No No \n",
"61585 Yes No No \n",
"61586 Yes No No \n",
"61587 No No Yes \n",
"61588 Yes No Yes \n",
"\n",
"[5 rows x 23 columns]"
],
"text/html": [
"\n",
" <div id=\"df-59910827-0e9b-4962-98d8-19a782c6db4f\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sku</th>\n",
" <th>national_inv</th>\n",
" <th>lead_time</th>\n",
" <th>in_transit_qty</th>\n",
" <th>forecast_3_month</th>\n",
" <th>forecast_6_month</th>\n",
" <th>forecast_9_month</th>\n",
" <th>sales_1_month</th>\n",
" <th>sales_3_month</th>\n",
" <th>sales_6_month</th>\n",
" <th>...</th>\n",
" <th>pieces_past_due</th>\n",
" <th>perf_6_month_avg</th>\n",
" <th>perf_12_month_avg</th>\n",
" <th>local_bo_qty</th>\n",
" <th>deck_risk</th>\n",
" <th>oe_constraint</th>\n",
" <th>ppap_risk</th>\n",
" <th>stop_auto_buy</th>\n",
" <th>rev_stop</th>\n",
" <th>went_on_backorder</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>61584</th>\n",
" <td>1397275</td>\n",
" <td>6</td>\n",
" <td>8.0</td>\n",
" <td>0</td>\n",
" <td>24</td>\n",
" <td>24</td>\n",
" <td>24</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>9</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.98</td>\n",
" <td>0.98</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61585</th>\n",
" <td>3072139</td>\n",
" <td>130</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" <td>80</td>\n",
" <td>140</td>\n",
" <td>18</td>\n",
" <td>108</td>\n",
" <td>230</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.51</td>\n",
" <td>0.28</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61586</th>\n",
" <td>1909363</td>\n",
" <td>135</td>\n",
" <td>9.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>40</td>\n",
" <td>65</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>1.00</td>\n",
" <td>0.99</td>\n",
" <td>0</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61587</th>\n",
" <td>1845783</td>\n",
" <td>63</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>452</td>\n",
" <td>1715</td>\n",
" <td>3425</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>-99.00</td>\n",
" <td>-99.00</td>\n",
" <td>1</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61588</th>\n",
" <td>1200539</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0.79</td>\n",
" <td>0.78</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-59910827-0e9b-4962-98d8-19a782c6db4f')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-59910827-0e9b-4962-98d8-19a782c6db4f button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-59910827-0e9b-4962-98d8-19a782c6db4f');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"data.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZoYtcwPhxhC_"
},
"source": [
"## Statistic summary \n",
" Using describe function"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 488
},
"id": "0H-XRdqwxhC_",
"outputId": "9630616f-4c1f-46b8-e83c-a123a4ebb502"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" sku national_inv lead_time in_transit_qty \\\n",
"count 6.158900e+04 61589.000000 58186.000000 61589.000000 \n",
"unique NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN \n",
"mean 2.037188e+06 287.721882 7.559619 30.192843 \n",
"std 6.564178e+05 4233.906931 6.498952 792.869253 \n",
"min 1.068628e+06 -2999.000000 0.000000 0.000000 \n",
"25% 1.498574e+06 3.000000 4.000000 0.000000 \n",
"50% 1.898033e+06 10.000000 8.000000 0.000000 \n",
"75% 2.314826e+06 57.000000 8.000000 0.000000 \n",
"max 3.284895e+06 673445.000000 52.000000 170976.000000 \n",
"\n",
" forecast_3_month forecast_6_month forecast_9_month sales_1_month \\\n",
"count 6.158900e+04 6.158900e+04 6.158900e+04 61589.000000 \n",
"unique NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN \n",
"mean 1.692728e+02 3.150413e+02 4.535760e+02 44.742957 \n",
"std 5.286742e+03 9.774362e+03 1.420201e+04 1373.805831 \n",
"min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"50% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"75% 1.200000e+01 2.500000e+01 3.600000e+01 6.000000 \n",
"max 1.126656e+06 2.094336e+06 3.062016e+06 295197.000000 \n",
"\n",
" sales_3_month sales_6_month ... pieces_past_due perf_6_month_avg \\\n",
"count 61589.000000 6.158900e+04 ... 61589.000000 61589.000000 \n",
"unique NaN NaN ... NaN NaN \n",
"top NaN NaN ... NaN NaN \n",
"freq NaN NaN ... NaN NaN \n",
"mean 150.732631 2.835465e+02 ... 1.605400 -6.264182 \n",
"std 5224.959649 8.872270e+03 ... 42.309229 25.537906 \n",
"min 0.000000 0.000000e+00 ... 0.000000 -99.000000 \n",
"25% 0.000000 0.000000e+00 ... 0.000000 0.620000 \n",
"50% 2.000000 4.000000e+00 ... 0.000000 0.820000 \n",
"75% 17.000000 3.400000e+01 ... 0.000000 0.960000 \n",
"max 934593.000000 1.799099e+06 ... 7392.000000 1.000000 \n",
"\n",
" perf_12_month_avg local_bo_qty deck_risk oe_constraint ppap_risk \\\n",
"count 61589.000000 61589.000000 61589 61589 61589 \n",
"unique NaN NaN 2 2 2 \n",
"top NaN NaN No No No \n",
"freq NaN NaN 48145 61577 53792 \n",
"mean -5.863664 1.205361 NaN NaN NaN \n",
"std 24.844514 29.981155 NaN NaN NaN \n",
"min -99.000000 0.000000 NaN NaN NaN \n",
"25% 0.640000 0.000000 NaN NaN NaN \n",
"50% 0.800000 0.000000 NaN NaN NaN \n",
"75% 0.950000 0.000000 NaN NaN NaN \n",
"max 1.000000 2999.000000 NaN NaN NaN \n",
"\n",
" stop_auto_buy rev_stop went_on_backorder \n",
"count 61589 61589 61589 \n",
"unique 2 2 2 \n",
"top Yes No No \n",
"freq 59303 61569 50296 \n",
"mean NaN NaN NaN \n",
"std NaN NaN NaN \n",
"min NaN NaN NaN \n",
"25% NaN NaN NaN \n",
"50% NaN NaN NaN \n",
"75% NaN NaN NaN \n",
"max NaN NaN NaN \n",
"\n",
"[11 rows x 23 columns]"
],
"text/html": [
"\n",
" <div id=\"df-60908db2-7f9e-422e-93eb-15cca20acb09\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sku</th>\n",
" <th>national_inv</th>\n",
" <th>lead_time</th>\n",
" <th>in_transit_qty</th>\n",
" <th>forecast_3_month</th>\n",
" <th>forecast_6_month</th>\n",
" <th>forecast_9_month</th>\n",
" <th>sales_1_month</th>\n",
" <th>sales_3_month</th>\n",
" <th>sales_6_month</th>\n",
" <th>...</th>\n",
" <th>pieces_past_due</th>\n",
" <th>perf_6_month_avg</th>\n",
" <th>perf_12_month_avg</th>\n",
" <th>local_bo_qty</th>\n",
" <th>deck_risk</th>\n",
" <th>oe_constraint</th>\n",
" <th>ppap_risk</th>\n",
" <th>stop_auto_buy</th>\n",
" <th>rev_stop</th>\n",
" <th>went_on_backorder</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>6.158900e+04</td>\n",
" <td>61589.000000</td>\n",
" <td>58186.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>6.158900e+04</td>\n",
" <td>6.158900e+04</td>\n",
" <td>6.158900e+04</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>6.158900e+04</td>\n",
" <td>...</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>48145</td>\n",
" <td>61577</td>\n",
" <td>53792</td>\n",
" <td>59303</td>\n",
" <td>61569</td>\n",
" <td>50296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2.037188e+06</td>\n",
" <td>287.721882</td>\n",
" <td>7.559619</td>\n",
" <td>30.192843</td>\n",
" <td>1.692728e+02</td>\n",
" <td>3.150413e+02</td>\n",
" <td>4.535760e+02</td>\n",
" <td>44.742957</td>\n",
" <td>150.732631</td>\n",
" <td>2.835465e+02</td>\n",
" <td>...</td>\n",
" <td>1.605400</td>\n",
" <td>-6.264182</td>\n",
" <td>-5.863664</td>\n",
" <td>1.205361</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>6.564178e+05</td>\n",
" <td>4233.906931</td>\n",
" <td>6.498952</td>\n",
" <td>792.869253</td>\n",
" <td>5.286742e+03</td>\n",
" <td>9.774362e+03</td>\n",
" <td>1.420201e+04</td>\n",
" <td>1373.805831</td>\n",
" <td>5224.959649</td>\n",
" <td>8.872270e+03</td>\n",
" <td>...</td>\n",
" <td>42.309229</td>\n",
" <td>25.537906</td>\n",
" <td>24.844514</td>\n",
" <td>29.981155</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.068628e+06</td>\n",
" <td>-2999.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>-99.000000</td>\n",
" <td>-99.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>1.498574e+06</td>\n",
" <td>3.000000</td>\n",
" <td>4.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.620000</td>\n",
" <td>0.640000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.898033e+06</td>\n",
" <td>10.000000</td>\n",
" <td>8.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>4.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.820000</td>\n",
" <td>0.800000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2.314826e+06</td>\n",
" <td>57.000000</td>\n",
" <td>8.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.200000e+01</td>\n",
" <td>2.500000e+01</td>\n",
" <td>3.600000e+01</td>\n",
" <td>6.000000</td>\n",
" <td>17.000000</td>\n",
" <td>3.400000e+01</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.960000</td>\n",
" <td>0.950000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>3.284895e+06</td>\n",
" <td>673445.000000</td>\n",
" <td>52.000000</td>\n",
" <td>170976.000000</td>\n",
" <td>1.126656e+06</td>\n",
" <td>2.094336e+06</td>\n",
" <td>3.062016e+06</td>\n",
" <td>295197.000000</td>\n",
" <td>934593.000000</td>\n",
" <td>1.799099e+06</td>\n",
" <td>...</td>\n",
" <td>7392.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2999.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>11 rows × 23 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-60908db2-7f9e-422e-93eb-15cca20acb09')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-60908db2-7f9e-422e-93eb-15cca20acb09 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-60908db2-7f9e-422e-93eb-15cca20acb09');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 8
}
],
"source": [
"data.describe(include='all')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KZDoYHk0xhDA"
},
"source": [
"## Data type"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-UAkhs1PxhDA",
"outputId": "12ae5241-9443-43df-a556-2e32d88334c7"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"sku int64\n",
"national_inv int64\n",
"lead_time float64\n",
"in_transit_qty int64\n",
"forecast_3_month int64\n",
"forecast_6_month int64\n",
"forecast_9_month int64\n",
"sales_1_month int64\n",
"sales_3_month int64\n",
"sales_6_month int64\n",
"sales_9_month int64\n",
"min_bank int64\n",
"potential_issue object\n",
"pieces_past_due int64\n",
"perf_6_month_avg float64\n",
"perf_12_month_avg float64\n",
"local_bo_qty int64\n",
"deck_risk object\n",
"oe_constraint object\n",
"ppap_risk object\n",
"stop_auto_buy object\n",
"rev_stop object\n",
"went_on_backorder object\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 10
}
],
"source": [
"data.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LVqaayr-xhDA"
},
"source": [
"__Observations__\n",
"\n",
"* sku is `categorical` but is interpreted as `int64` \n",
"* potential_issue, deck_risk, oe_constraint, ppap_risk, stop_auto_buy, rev_stop, and went_on_backorder are also `categorical` but is interpreted as `object`. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7NsyOdctxhDA"
},
"source": [
"# Data pre-processing"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hzWZrWYcxhDA"
},
"source": [
"## Convert all the attributes to appropriate type"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lSCup59axhDA"
},
"source": [
"Data type conversion\n",
"\n",
" Using astype('category') to convert potential_issue, deck_risk, oe_constraint, ppap_risk, stop_auto_buy, rev_stop, and went_on_backorder attributes to categorical attributes.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "9fhWJkKrxhDA"
},
"outputs": [],
"source": [
"for col in ['sku', 'potential_issue', 'deck_risk', 'oe_constraint', 'ppap_risk', \n",
" 'stop_auto_buy', 'rev_stop', 'went_on_backorder']:\n",
" data[col] = data[col].astype('category')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5hPB6Dy_xhDA"
},
"source": [
"### Re-display data type of each variable"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CSlF8lJZxhDA",
"outputId": "37ec1ed4-2c75-4960-ce47-55945711aba5"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"sku category\n",
"national_inv int64\n",
"lead_time float64\n",
"in_transit_qty int64\n",
"forecast_3_month int64\n",
"forecast_6_month int64\n",
"forecast_9_month int64\n",
"sales_1_month int64\n",
"sales_3_month int64\n",
"sales_6_month int64\n",
"sales_9_month int64\n",
"min_bank int64\n",
"potential_issue category\n",
"pieces_past_due int64\n",
"perf_6_month_avg float64\n",
"perf_12_month_avg float64\n",
"local_bo_qty int64\n",
"deck_risk category\n",
"oe_constraint category\n",
"ppap_risk category\n",
"stop_auto_buy category\n",
"rev_stop category\n",
"went_on_backorder category\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"data.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "34zwxm4axhDA"
},
"source": [
"### Statistic summary \n",
" Using describe function"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 488
},
"id": "nBkx1E-IxhDA",
"outputId": "fce3fcee-2d36-4a22-8feb-a90f1b5d6f13"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" sku national_inv lead_time in_transit_qty \\\n",
"count 61589.0 61589.000000 58186.000000 61589.000000 \n",
"unique 61589.0 NaN NaN NaN \n",
"top 1068628.0 NaN NaN NaN \n",
"freq 1.0 NaN NaN NaN \n",
"mean NaN 287.721882 7.559619 30.192843 \n",
"std NaN 4233.906931 6.498952 792.869253 \n",
"min NaN -2999.000000 0.000000 0.000000 \n",
"25% NaN 3.000000 4.000000 0.000000 \n",
"50% NaN 10.000000 8.000000 0.000000 \n",
"75% NaN 57.000000 8.000000 0.000000 \n",
"max NaN 673445.000000 52.000000 170976.000000 \n",
"\n",
" forecast_3_month forecast_6_month forecast_9_month sales_1_month \\\n",
"count 6.158900e+04 6.158900e+04 6.158900e+04 61589.000000 \n",
"unique NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN \n",
"mean 1.692728e+02 3.150413e+02 4.535760e+02 44.742957 \n",
"std 5.286742e+03 9.774362e+03 1.420201e+04 1373.805831 \n",
"min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"50% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n",
"75% 1.200000e+01 2.500000e+01 3.600000e+01 6.000000 \n",
"max 1.126656e+06 2.094336e+06 3.062016e+06 295197.000000 \n",
"\n",
" sales_3_month sales_6_month ... pieces_past_due perf_6_month_avg \\\n",
"count 61589.000000 6.158900e+04 ... 61589.000000 61589.000000 \n",
"unique NaN NaN ... NaN NaN \n",
"top NaN NaN ... NaN NaN \n",
"freq NaN NaN ... NaN NaN \n",
"mean 150.732631 2.835465e+02 ... 1.605400 -6.264182 \n",
"std 5224.959649 8.872270e+03 ... 42.309229 25.537906 \n",
"min 0.000000 0.000000e+00 ... 0.000000 -99.000000 \n",
"25% 0.000000 0.000000e+00 ... 0.000000 0.620000 \n",
"50% 2.000000 4.000000e+00 ... 0.000000 0.820000 \n",
"75% 17.000000 3.400000e+01 ... 0.000000 0.960000 \n",
"max 934593.000000 1.799099e+06 ... 7392.000000 1.000000 \n",
"\n",
" perf_12_month_avg local_bo_qty deck_risk oe_constraint ppap_risk \\\n",
"count 61589.000000 61589.000000 61589 61589 61589 \n",
"unique NaN NaN 2 2 2 \n",
"top NaN NaN No No No \n",
"freq NaN NaN 48145 61577 53792 \n",
"mean -5.863664 1.205361 NaN NaN NaN \n",
"std 24.844514 29.981155 NaN NaN NaN \n",
"min -99.000000 0.000000 NaN NaN NaN \n",
"25% 0.640000 0.000000 NaN NaN NaN \n",
"50% 0.800000 0.000000 NaN NaN NaN \n",
"75% 0.950000 0.000000 NaN NaN NaN \n",
"max 1.000000 2999.000000 NaN NaN NaN \n",
"\n",
" stop_auto_buy rev_stop went_on_backorder \n",
"count 61589 61589 61589 \n",
"unique 2 2 2 \n",
"top Yes No No \n",
"freq 59303 61569 50296 \n",
"mean NaN NaN NaN \n",
"std NaN NaN NaN \n",
"min NaN NaN NaN \n",
"25% NaN NaN NaN \n",
"50% NaN NaN NaN \n",
"75% NaN NaN NaN \n",
"max NaN NaN NaN \n",
"\n",
"[11 rows x 23 columns]"
],
"text/html": [
"\n",
" <div id=\"df-66ce02d0-8602-4405-9245-ebbfc02dabc0\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sku</th>\n",
" <th>national_inv</th>\n",
" <th>lead_time</th>\n",
" <th>in_transit_qty</th>\n",
" <th>forecast_3_month</th>\n",
" <th>forecast_6_month</th>\n",
" <th>forecast_9_month</th>\n",
" <th>sales_1_month</th>\n",
" <th>sales_3_month</th>\n",
" <th>sales_6_month</th>\n",
" <th>...</th>\n",
" <th>pieces_past_due</th>\n",
" <th>perf_6_month_avg</th>\n",
" <th>perf_12_month_avg</th>\n",
" <th>local_bo_qty</th>\n",
" <th>deck_risk</th>\n",
" <th>oe_constraint</th>\n",
" <th>ppap_risk</th>\n",
" <th>stop_auto_buy</th>\n",
" <th>rev_stop</th>\n",
" <th>went_on_backorder</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>61589.0</td>\n",
" <td>61589.000000</td>\n",
" <td>58186.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>6.158900e+04</td>\n",
" <td>6.158900e+04</td>\n",
" <td>6.158900e+04</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>6.158900e+04</td>\n",
" <td>...</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589.000000</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" <td>61589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>61589.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>1068628.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>1.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>48145</td>\n",
" <td>61577</td>\n",
" <td>53792</td>\n",
" <td>59303</td>\n",
" <td>61569</td>\n",
" <td>50296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>NaN</td>\n",
" <td>287.721882</td>\n",
" <td>7.559619</td>\n",
" <td>30.192843</td>\n",
" <td>1.692728e+02</td>\n",
" <td>3.150413e+02</td>\n",
" <td>4.535760e+02</td>\n",
" <td>44.742957</td>\n",
" <td>150.732631</td>\n",
" <td>2.835465e+02</td>\n",
" <td>...</td>\n",
" <td>1.605400</td>\n",
" <td>-6.264182</td>\n",
" <td>-5.863664</td>\n",
" <td>1.205361</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>NaN</td>\n",
" <td>4233.906931</td>\n",
" <td>6.498952</td>\n",
" <td>792.869253</td>\n",
" <td>5.286742e+03</td>\n",
" <td>9.774362e+03</td>\n",
" <td>1.420201e+04</td>\n",
" <td>1373.805831</td>\n",
" <td>5224.959649</td>\n",
" <td>8.872270e+03</td>\n",
" <td>...</td>\n",
" <td>42.309229</td>\n",
" <td>25.537906</td>\n",
" <td>24.844514</td>\n",
" <td>29.981155</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>NaN</td>\n",
" <td>-2999.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>-99.000000</td>\n",
" <td>-99.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>NaN</td>\n",
" <td>3.000000</td>\n",
" <td>4.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.620000</td>\n",
" <td>0.640000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>NaN</td>\n",
" <td>10.000000</td>\n",
" <td>8.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>4.000000e+00</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.820000</td>\n",
" <td>0.800000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>NaN</td>\n",
" <td>57.000000</td>\n",
" <td>8.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.200000e+01</td>\n",
" <td>2.500000e+01</td>\n",
" <td>3.600000e+01</td>\n",
" <td>6.000000</td>\n",
" <td>17.000000</td>\n",
" <td>3.400000e+01</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.960000</td>\n",
" <td>0.950000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>NaN</td>\n",
" <td>673445.000000</td>\n",
" <td>52.000000</td>\n",
" <td>170976.000000</td>\n",
" <td>1.126656e+06</td>\n",
" <td>2.094336e+06</td>\n",
" <td>3.062016e+06</td>\n",
" <td>295197.000000</td>\n",
" <td>934593.000000</td>\n",
" <td>1.799099e+06</td>\n",
" <td>...</td>\n",
" <td>7392.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2999.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>11 rows × 23 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-66ce02d0-8602-4405-9245-ebbfc02dabc0')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-66ce02d0-8602-4405-9245-ebbfc02dabc0 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-66ce02d0-8602-4405-9245-ebbfc02dabc0');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 13
}
],
"source": [
"data.describe(include='all')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "av_mckTJxhDA"
},
"source": [
"## Delete sku attribute"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "rs1vspbBxhDA",
"outputId": "d4a3e71a-fbe6-4753-b69a-5c9efc78159e"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"61589"
]
},
"metadata": {},
"execution_count": 14
}
],
"source": [
"np.size(np.unique(data.sku, return_counts=True)[0])"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "kQWI0_TaxhDB"
},
"outputs": [],
"source": [
"data.drop('sku', axis=1, inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WSH4_wtsxhDB"
},
"source": [
"## Missing Data\n",
"\n",
" Missing value analysis and dropping the records with missing values"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "h43X2p9nxhDB",
"outputId": "3117f64d-37fe-4b0c-d680-61f56340831c"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"national_inv 0\n",
"lead_time 3403\n",
"in_transit_qty 0\n",
"forecast_3_month 0\n",
"forecast_6_month 0\n",
"forecast_9_month 0\n",
"sales_1_month 0\n",
"sales_3_month 0\n",
"sales_6_month 0\n",
"sales_9_month 0\n",
"min_bank 0\n",
"potential_issue 0\n",
"pieces_past_due 0\n",
"perf_6_month_avg 0\n",
"perf_12_month_avg 0\n",
"local_bo_qty 0\n",
"deck_risk 0\n",
"oe_constraint 0\n",
"ppap_risk 0\n",
"stop_auto_buy 0\n",
"rev_stop 0\n",
"went_on_backorder 0\n",
"dtype: int64"
]
},
"metadata": {},
"execution_count": 16
}
],
"source": [
"data.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wldvSn5-xhDB"
},
"source": [
"Observing the number of records before and after missing value records removal"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SV0C8zp3xhDB",
"outputId": "60070dc6-527a-4b66-8fa1-a133671083db"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(61589, 22)\n"
]
}
],
"source": [
"print (data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Dypxfb4yxhDB"
},
"source": [
"Since the number of missing values is about 5% and as we have around 61K records. For initial analysis we ignore all these records"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "H7_U4VlFxhDB"
},
"outputs": [],
"source": [
"data = data.dropna(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Y3chXt9nxhDB",
"outputId": "a7fe1d99-b762-4004-c873-edd2905cd5f4"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"national_inv 0\n",
"lead_time 0\n",
"in_transit_qty 0\n",
"forecast_3_month 0\n",
"forecast_6_month 0\n",
"forecast_9_month 0\n",
"sales_1_month 0\n",
"sales_3_month 0\n",
"sales_6_month 0\n",
"sales_9_month 0\n",
"min_bank 0\n",
"potential_issue 0\n",
"pieces_past_due 0\n",
"perf_6_month_avg 0\n",
"perf_12_month_avg 0\n",
"local_bo_qty 0\n",
"deck_risk 0\n",
"oe_constraint 0\n",
"ppap_risk 0\n",
"stop_auto_buy 0\n",
"rev_stop 0\n",
"went_on_backorder 0\n",
"dtype: int64\n",
"(58186, 22)\n"
]
}
],
"source": [
"print(data.isnull().sum())\n",
"print(data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xt6dAt38xhDB"
},
"source": [
"## Train and test split"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5HSV8LU7xhDB"
},
"source": [
"### Target attribute distribution "
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eCInoazKxhDB",
"outputId": "7d046160-6bc7-49d3-f933-477ffca0062c"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"No 47217\n",
"Yes 10969\n",
"Name: went_on_backorder, dtype: int64"
]
},
"metadata": {},
"execution_count": 20
}
],
"source": [
"data['went_on_backorder'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AEVHt4KlxhDB",
"outputId": "8da50b46-4776-43c2-a1fc-98f947108730"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"No 81.148386\n",
"Yes 18.851614\n",
"Name: went_on_backorder, dtype: float64"
]
},
"metadata": {},
"execution_count": 21
}
],
"source": [
"data['went_on_backorder'].value_counts(normalize=True)*100"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NZRr2tOyxhDB"
},
"source": [
"### Split the data into train and test\n",
"sklearn.model_selection.train_test_split\n",
"\n",
" Split arrays or matrices into random train and test subsets"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"id": "OOFv5Xu3xhDB"
},
"outputs": [],
"source": [
"X = data.drop('went_on_backorder', axis=1)\n",
"y = data['went_on_backorder']\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=123) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ILOKpXB4xhDB"
},
"source": [
"### Target attribute distribution after the split"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "OKir3gPwxhDC",
"outputId": "a0e081b9-4588-4f8e-e880-4e95d1f7cef2"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"No 81.185858\n",
"Yes 18.814142\n",
"Name: went_on_backorder, dtype: float64\n",
"No 81.060953\n",
"Yes 18.939047\n",
"Name: went_on_backorder, dtype: float64\n"
]
}
],
"source": [
"print(pd.value_counts(y_train)/y_train.count() * 100)\n",
"\n",
"print(pd.value_counts(y_test) /y_test.count() * 100)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1GpCYP7yxhDC"
},
"source": [
"## Convert categorical target attribute to numeric\n",
"\n",
"Using `LabelEncoder` convert categorical target attribute __went_on_backorder__ to numeric"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"id": "7vgm8gQJxhDC"
},
"outputs": [],
"source": [
"le = LabelEncoder()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W2N8blf1xhDC",
"outputId": "1248951e-2607-47e8-fa50-118a801b4938"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LabelEncoder()"
]
},
"metadata": {},
"execution_count": 25
}
],
"source": [
"le.fit(y_train)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"id": "atM4q89HxhDC"
},
"outputs": [],
"source": [
"y_train = le.transform(y_train)\n",
"y_test = le.transform(y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1VT0kgSUxhDC"
},
"source": [
"### Target attribute distribution"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yAamoRKWxhDC",
"outputId": "3ad45cda-1c74-440c-8a6b-15120fa4ea81"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"40730"
]
},
"metadata": {},
"execution_count": 27
}
],
"source": [
"y_train.size"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NC2avNC0xhDC",
"outputId": "63d86284-4b04-4709-9760-32bcf421dab2"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 81.185858\n",
"1 18.814142\n",
"dtype: float64"
]
},
"metadata": {},
"execution_count": 28
}
],
"source": [
"pd.value_counts(y_train)/y_train.size * 100"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aND4R5cMxhDC"
},
"source": [
"## Checking the data types"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BPkf7sTVxhDC",
"outputId": "952b9ab1-4902-47a1-e785-df7d1de1b305"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"national_inv int64\n",
"lead_time float64\n",
"in_transit_qty int64\n",
"forecast_3_month int64\n",
"forecast_6_month int64\n",
"forecast_9_month int64\n",
"sales_1_month int64\n",
"sales_3_month int64\n",
"sales_6_month int64\n",
"sales_9_month int64\n",
"min_bank int64\n",
"potential_issue category\n",
"pieces_past_due int64\n",
"perf_6_month_avg float64\n",
"perf_12_month_avg float64\n",
"local_bo_qty int64\n",
"deck_risk category\n",
"oe_constraint category\n",
"ppap_risk category\n",
"stop_auto_buy category\n",
"rev_stop category\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 29
}
],
"source": [
"X_train.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QVuog5a2xhDC"
},
"source": [
"## Standardize the numerical attributes\n",
"\n",
"__Note__: For Decision Tree and Random Forest Numeric attributes need not be standardized. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-h3a8Z_rxhDC"
},
"source": [
"### Store numerical attributes name"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ISj69f_vxhDC",
"outputId": "48a47c27-7fcc-463a-bdbd-06bd5620f8ab"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['national_inv', 'lead_time', 'in_transit_qty', 'forecast_3_month',\n",
" 'forecast_6_month', 'forecast_9_month', 'sales_1_month',\n",
" 'sales_3_month', 'sales_6_month', 'sales_9_month', 'min_bank',\n",
" 'pieces_past_due', 'perf_6_month_avg', 'perf_12_month_avg',\n",
" 'local_bo_qty'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 30
}
],
"source": [
"num_attr = X_train.select_dtypes(include=['float64', 'int64']).columns\n",
"num_attr"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JXCRchW2xhDC"
},
"source": [
"### Using StandardScaler, standardize the numerical attributes "
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "R6Ba6FvexhDC",
"outputId": "153a6d20-399c-42b8-cbd1-0d40158b13ff"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"StandardScaler()"
]
},
"metadata": {},
"execution_count": 31
}
],
"source": [
"scaler = StandardScaler()\n",
"scaler.fit(X_train[num_attr])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"id": "QkiytUbmxhDC"
},
"outputs": [],
"source": [
"X_train_std = scaler.transform(X_train[num_attr])\n",
"X_test_std = scaler.transform(X_test[num_attr])"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "o9DTwAGLxhDE",
"outputId": "93faf1df-0ef3-492a-8f04-fcb6f044eeb4"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(40730, 15)\n",
"(17456, 15)\n"
]
}
],
"source": [
"print(X_train_std.shape)\n",
"print(X_test_std.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cshhawYDxhDE"
},
"source": [
"## Converting Categorical attributes to Numeric attributes"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GyWBVlqsxhDE"
},
"source": [
"### Store categorical attributes name"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EDVeZZEfxhDE",
"outputId": "6bb80261-9cbc-4fea-8ea0-bb2f7a2547d8"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['potential_issue', 'deck_risk', 'oe_constraint', 'ppap_risk',\n",
" 'stop_auto_buy', 'rev_stop'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 34
}
],
"source": [
"cat_attr = X_train.select_dtypes(include=['category']).columns\n",
"cat_attr\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yo1Xy4dLxhDE"
},
"source": [
"### Using OneHotEncoder, converting Categorical attributes to Numeric attributes "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "X-hSOJ_0xhDE",
"outputId": "7b8424e1-bed7-431d-c9d5-265aa7bfbfe4"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"OneHotEncoder(drop='first')"
]
},
"metadata": {},
"execution_count": 35
}
],
"source": [
"enc = OneHotEncoder(drop = 'first')\n",
"\n",
"enc.fit(X_train[cat_attr])"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"id": "CuD8Yw8SxhDE"
},
"outputs": [],
"source": [
"X_train_ohe=enc.transform(X_train[cat_attr]).toarray()\n",
"X_test_ohe=enc.transform(X_test[cat_attr]).toarray()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "srf58_S-xhDE"
},
"source": [
"## Concatenate attribute\n",
"Standardised numeric attributes and One hot encoded categorical attributes"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"id": "-SUrhnsExhDE"
},
"outputs": [],
"source": [
"X_train_con = np.concatenate([X_train_std, X_train_ohe], axis=1)\n",
"X_test_con = np.concatenate([X_test_std, X_test_ohe], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gTzB871txhDE",
"outputId": "acf7624c-eec4-4f69-81a0-b641eb625048"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(40730, 21)"
]
},
"metadata": {},
"execution_count": 38
}
],
"source": [
"X_train_con.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XmizzhKUxhDF"
},
"source": [
"# Model building"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ExZZL5a6xhDF"
},
"source": [
"## RandomForestClassifier Model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YP0ktg6vxhDF"
},
"source": [
"### Instantiate Model"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"id": "bEq9bNzsxhDF"
},
"outputs": [],
"source": [
"clf1 = RandomForestClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5lJQpibExhDF"
},
"source": [
"https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N5zxC9l1xhDF"
},
"source": [
"### Train Model"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NmBv5VYnxhDF",
"outputId": "48b9b715-5941-4af8-b3c2-700c19ff8cb4"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"RandomForestClassifier()"
]
},
"metadata": {},
"execution_count": 40
}
],
"source": [
"clf1.fit(X_train_con, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o33ejCyGxhDF"
},
"source": [
"### List important features"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uJFQBf2xxhDF"
},
"source": [
"Here is the python code which can be used for determining feature importance. The attribute, feature_importances_ gives the importance of each feature in the order in which the features are arranged in training dataset. Note how the indices are arranged in descending order while using argsort method (most important feature appears first)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zvRI65lZxhDF",
"outputId": "38d5e16f-9c55-4ec3-a723-5227d4410259"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[2.28340499e-01 3.18093993e-02 4.27624407e-02 1.04344775e-01\n",
" 8.25886632e-02 8.55111021e-02 4.37865203e-02 5.05847135e-02\n",
" 5.15601421e-02 6.06862325e-02 3.73390686e-02 1.06594152e-02\n",
" 6.35983371e-02 6.41008120e-02 2.14548483e-02 3.34686407e-04\n",
" 9.26717770e-03 6.31896209e-05 8.64372879e-03 2.55328309e-03\n",
" 1.09657295e-05]\n",
"[ 0 3 5 4 13 12 9 8 7 6 2 10 1 14 11 16 18 19 15 17 20]\n"
]
}
],
"source": [
"importances = clf1.feature_importances_\n",
"print(importances)\n",
"\n",
"indices = np.argsort(importances)[::-1]\n",
"print(indices)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 205
},
"id": "X6UeC2vSxhDF",
"outputId": "29a30b74-3a6f-4117-a516-af0524a16e19"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" 0 1 2 3 \\\n",
"0 national_inv forecast_3_month forecast_9_month forecast_6_month \n",
"1 0.22834 0.104345 0.085511 0.082589 \n",
"\n",
" 4 5 6 7 \\\n",
"0 perf_12_month_avg perf_6_month_avg sales_9_month sales_6_month \n",
"1 0.064101 0.063598 0.060686 0.05156 \n",
"\n",
" 8 9 ... 11 12 13 \\\n",
"0 sales_3_month sales_1_month ... min_bank lead_time local_bo_qty \n",
"1 0.050585 0.043787 ... 0.037339 0.031809 0.021455 \n",
"\n",
" 14 15 16 17 18 \\\n",
"0 pieces_past_due deck_risk ppap_risk stop_auto_buy potential_issue \n",
"1 0.010659 0.009267 0.008644 0.002553 0.000335 \n",
"\n",
" 19 20 \n",
"0 oe_constraint rev_stop \n",
"1 0.000063 0.000011 \n",
"\n",
"[2 rows x 21 columns]"
],
"text/html": [
"\n",
" <div id=\"df-8e57fe56-8c83-4112-ab1b-d907b27908f0\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>11</th>\n",
" <th>12</th>\n",
" <th>13</th>\n",
" <th>14</th>\n",
" <th>15</th>\n",
" <th>16</th>\n",
" <th>17</th>\n",
" <th>18</th>\n",
" <th>19</th>\n",
" <th>20</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>national_inv</td>\n",
" <td>forecast_3_month</td>\n",
" <td>forecast_9_month</td>\n",
" <td>forecast_6_month</td>\n",
" <td>perf_12_month_avg</td>\n",
" <td>perf_6_month_avg</td>\n",
" <td>sales_9_month</td>\n",
" <td>sales_6_month</td>\n",
" <td>sales_3_month</td>\n",
" <td>sales_1_month</td>\n",
" <td>...</td>\n",
" <td>min_bank</td>\n",
" <td>lead_time</td>\n",
" <td>local_bo_qty</td>\n",
" <td>pieces_past_due</td>\n",
" <td>deck_risk</td>\n",
" <td>ppap_risk</td>\n",
" <td>stop_auto_buy</td>\n",
" <td>potential_issue</td>\n",
" <td>oe_constraint</td>\n",
" <td>rev_stop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.22834</td>\n",
" <td>0.104345</td>\n",
" <td>0.085511</td>\n",
" <td>0.082589</td>\n",
" <td>0.064101</td>\n",
" <td>0.063598</td>\n",
" <td>0.060686</td>\n",
" <td>0.05156</td>\n",
" <td>0.050585</td>\n",
" <td>0.043787</td>\n",
" <td>...</td>\n",
" <td>0.037339</td>\n",
" <td>0.031809</td>\n",
" <td>0.021455</td>\n",
" <td>0.010659</td>\n",
" <td>0.009267</td>\n",
" <td>0.008644</td>\n",
" <td>0.002553</td>\n",
" <td>0.000335</td>\n",
" <td>0.000063</td>\n",
" <td>0.000011</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows × 21 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8e57fe56-8c83-4112-ab1b-d907b27908f0')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-8e57fe56-8c83-4112-ab1b-d907b27908f0 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-8e57fe56-8c83-4112-ab1b-d907b27908f0');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 42
}
],
"source": [
"ind_attr_names = np.concatenate((num_attr, cat_attr), axis=0)\n",
"\n",
"pd.DataFrame([ind_attr_names[indices], np.sort(importances)[::-1]])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "alceiTilxhDF"
},
"source": [
"### Predict"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"id": "Sj6QdzYCxhDF"
},
"outputs": [],
"source": [
"train_pred = clf1.predict(X_train_con)\n",
"test_pred = clf1.predict(X_test_con)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0meVhktoxhDF"
},
"source": [
"### Evaluate"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mHTd0nm0xhDF",
"outputId": "ec8bc838-81ea-4e7b-bd81-4dd85c66a59e"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--Train--\n",
"Confusion Matrix \n",
" [[33032 35]\n",
" [ 54 7609]]\n",
"Accurcay : 0.9978148784679597\n",
"Recall : 0.9929531515072426\n",
"Precision: 0.9954212454212454\n",
"--Test--\n",
"Confusion Matrix \n",
" [[13615 535]\n",
" [ 608 2698]]\n",
"Accurcay : 0.9345210815765352\n",
"Recall : 0.8160919540229885\n",
"Precision: 0.8345190225796474\n"
]
}
],
"source": [
"print(\"--Train--\")\n",
"evaluate_model(y_train, train_pred)\n",
"print(\"--Test--\")\n",
"evaluate_model(y_test, test_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fNmD-5JKxhDF"
},
"source": [
"## Up-sampling \n",
"\n",
" Using SMOTE (Synthetic Minority Over-sampling Technique)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "avmGYRDuxhDF"
},
"source": [
"### Instantiate SMOTE"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"id": "EVOSjq75xhDF"
},
"outputs": [],
"source": [
"smote = SMOTE(random_state=123)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6QachIEgxhDF"
},
"source": [
"### Fit Sample"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"id": "ikkOwEIQxhDF"
},
"outputs": [],
"source": [
"X_train_sm, y_train_sm = smote.fit_resample(X_train_con, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QO8eOrsZxhDF",
"outputId": "743776d6-b58e-4615-a185-f19f55b9a7df"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([0, 1]), array([33067, 7663]))"
]
},
"metadata": {},
"execution_count": 47
}
],
"source": [
"np.unique(y_train, return_counts= True)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XK_Iu5ZYxhDF",
"outputId": "bfc3160c-47f9-4b93-f264-913cc7e0a015"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([0, 1]), array([33067, 33067]))"
]
},
"metadata": {},
"execution_count": 48
}
],
"source": [
"np.unique(y_train_sm, return_counts= True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5oAlhm-kxhDF"
},
"source": [
"## RandomForestClassifier with up-sample data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SYG4XR57xhDG"
},
"source": [
"### Instantiate Model"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"id": "Qb_Yf9_nxhDG"
},
"outputs": [],
"source": [
"clf2 = RandomForestClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LZ7h_VJaxhDG"
},
"source": [
"### Train the model"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oqCqnWOtxhDG",
"outputId": "c71639ba-21e9-474f-d0a6-d8985dc4512c"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"RandomForestClassifier()"
]
},
"metadata": {},
"execution_count": 50
}
],
"source": [
"clf2.fit(X_train_sm, y_train_sm)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ijvcOXF9xhDG"
},
"source": [
"### List important features"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0ib1kFrFxhDG",
"outputId": "ae89dfc5-dab7-4d95-d7ea-8ccf1881f0cb"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[1.96528103e-01 2.67590210e-02 3.63421596e-02 1.47591376e-01\n",
" 1.26087461e-01 1.29842361e-01 4.86239696e-02 5.03434848e-02\n",
" 4.09201696e-02 4.09172087e-02 2.85915301e-02 8.52679461e-03\n",
" 4.33739851e-02 4.51843154e-02 1.53286652e-02 1.78528790e-04\n",
" 6.48161773e-03 4.05954374e-05 6.02825932e-03 2.30010368e-03\n",
" 1.02904891e-05]\n",
"[ 0 3 5 4 7 6 13 12 8 9 2 10 1 14 11 16 18 19 15 17 20]\n"
]
}
],
"source": [
"importances = clf2.feature_importances_\n",
"print(importances)\n",
"\n",
"indices = np.argsort(importances)[::-1]\n",
"print(indices)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 205
},
"id": "PO5DX2bIxhDG",
"outputId": "15924308-537d-4c06-d4e0-8ed09d1eb7a0"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" 0 1 2 3 \\\n",
"0 national_inv forecast_3_month forecast_9_month forecast_6_month \n",
"1 0.196528 0.147591 0.129842 0.126087 \n",
"\n",
" 4 5 6 7 \\\n",
"0 sales_3_month sales_1_month perf_12_month_avg perf_6_month_avg \n",
"1 0.050343 0.048624 0.045184 0.043374 \n",
"\n",
" 8 9 ... 11 12 13 \\\n",
"0 sales_6_month sales_9_month ... min_bank lead_time local_bo_qty \n",
"1 0.04092 0.040917 ... 0.028592 0.026759 0.015329 \n",
"\n",
" 14 15 16 17 18 \\\n",
"0 pieces_past_due deck_risk ppap_risk stop_auto_buy potential_issue \n",
"1 0.008527 0.006482 0.006028 0.0023 0.000179 \n",
"\n",
" 19 20 \n",
"0 oe_constraint rev_stop \n",
"1 0.000041 0.00001 \n",
"\n",
"[2 rows x 21 columns]"
],
"text/html": [
"\n",
" <div id=\"df-e1dcd0a0-99af-4514-9f18-6544a374b967\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>11</th>\n",
" <th>12</th>\n",
" <th>13</th>\n",
" <th>14</th>\n",
" <th>15</th>\n",
" <th>16</th>\n",
" <th>17</th>\n",
" <th>18</th>\n",
" <th>19</th>\n",
" <th>20</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>national_inv</td>\n",
" <td>forecast_3_month</td>\n",
" <td>forecast_9_month</td>\n",
" <td>forecast_6_month</td>\n",
" <td>sales_3_month</td>\n",
" <td>sales_1_month</td>\n",
" <td>perf_12_month_avg</td>\n",
" <td>perf_6_month_avg</td>\n",
" <td>sales_6_month</td>\n",
" <td>sales_9_month</td>\n",
" <td>...</td>\n",
" <td>min_bank</td>\n",
" <td>lead_time</td>\n",
" <td>local_bo_qty</td>\n",
" <td>pieces_past_due</td>\n",
" <td>deck_risk</td>\n",
" <td>ppap_risk</td>\n",
" <td>stop_auto_buy</td>\n",
" <td>potential_issue</td>\n",
" <td>oe_constraint</td>\n",
" <td>rev_stop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.196528</td>\n",
" <td>0.147591</td>\n",
" <td>0.129842</td>\n",
" <td>0.126087</td>\n",
" <td>0.050343</td>\n",
" <td>0.048624</td>\n",
" <td>0.045184</td>\n",
" <td>0.043374</td>\n",
" <td>0.04092</td>\n",
" <td>0.040917</td>\n",
" <td>...</td>\n",
" <td>0.028592</td>\n",
" <td>0.026759</td>\n",
" <td>0.015329</td>\n",
" <td>0.008527</td>\n",
" <td>0.006482</td>\n",
" <td>0.006028</td>\n",
" <td>0.0023</td>\n",
" <td>0.000179</td>\n",
" <td>0.000041</td>\n",
" <td>0.00001</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows × 21 columns</p>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e1dcd0a0-99af-4514-9f18-6544a374b967')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-e1dcd0a0-99af-4514-9f18-6544a374b967 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-e1dcd0a0-99af-4514-9f18-6544a374b967');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 52
}
],
"source": [
"pd.DataFrame([ind_attr_names[indices], np.sort(importances)[::-1]])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Wac7V-gQxhDG"
},
"source": [
"### Predict"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"id": "Mz0vLbPjxhDG"
},
"outputs": [],
"source": [
"train_pred = clf2.predict(X_train_sm)\n",
"test_pred = clf2.predict(X_test_con)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K0bRFNs3xhDG"
},
"source": [
"### Evaluate"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qNdsqlu9xhDG",
"outputId": "078c1c9a-878c-43ce-f27c-2e1133675d2e"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--Train--\n",
"Confusion Matrix \n",
" [[33010 57]\n",
" [ 50 33017]]\n",
"Accurcay : 0.9983820727613633\n",
"Recall : 0.9984879184685638\n",
"Precision: 0.9982765918848643\n",
"--Test--\n",
"Confusion Matrix \n",
" [[13317 833]\n",
" [ 409 2897]]\n",
"Accurcay : 0.9288496791934006\n",
"Recall : 0.8762855414398064\n",
"Precision: 0.7766756032171582\n"
]
}
],
"source": [
"print(\"--Train--\")\n",
"evaluate_model(y_train_sm, train_pred)\n",
"print(\"--Test--\")\n",
"evaluate_model(y_test, test_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zIXcOn6YxhDG"
},
"source": [
"## Hyper-parameter tuning using Grid Search and Cross Validation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "larzdmadxhDG"
},
"source": [
"### Parameters to test"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"id": "CxAeQnBFxhDG"
},
"outputs": [],
"source": [
"param_grid = {\"n_estimators\" : [50, 100],\n",
" \"max_depth\" : [1,5],\n",
" \"max_features\" : [3, 5],\n",
" \"min_samples_leaf\" : [1, 2, 4]}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4T8qjETyxhDG"
},
"source": [
"### Instantiate Random Forest"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"id": "ReBJN9Q9xhDG"
},
"outputs": [],
"source": [
"clf3 = RandomForestClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZOupNaHYxhDG"
},
"source": [
"### Instantiate GridSearchCV "
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"id": "OLMpiiG8xhDG"
},
"outputs": [],
"source": [
"clf_grid = GridSearchCV(clf2, param_grid, cv=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AWcqYnrhxhDG"
},
"source": [
"### Train Random Forest using GridSearchCV"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MTr6W8uHxhDG",
"outputId": "a9a8d9a2-0064-45fe-e23e-51aa1eb982c4"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"GridSearchCV(cv=2, estimator=RandomForestClassifier(),\n",
" param_grid={'max_depth': [1, 5], 'max_features': [3, 5],\n",
" 'min_samples_leaf': [1, 2, 4],\n",
" 'n_estimators': [50, 100]})"
]
},
"metadata": {},
"execution_count": 58
}
],
"source": [
"clf_grid.fit(X_train_sm, y_train_sm)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OPrLMU6yxhDG"
},
"source": [
"### Best Params"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sK1vaIb6xhDH",
"outputId": "facaa64b-a02b-431a-f71d-e0ba44893793"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'max_depth': 5, 'max_features': 5, 'min_samples_leaf': 2, 'n_estimators': 100}"
]
},
"metadata": {},
"execution_count": 59
}
],
"source": [
"clf_grid.best_params_"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XkXCsNptxhDH"
},
"source": [
"### Predict "
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"id": "G-Ctw-rqxhDH"
},
"outputs": [],
"source": [
"train_pred = clf_grid.predict(X_train_sm)\n",
"test_pred = clf_grid.predict(X_test_con)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "By2BOFM0xhDH"
},
"source": [
"### Evaluate"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wjwaJZZqxhDH",
"outputId": "e3c7255f-42b5-4b56-8fc8-8b9954d2c4d2"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--Train--\n",
"Confusion Matrix \n",
" [[27992 5075]\n",
" [ 3260 29807]]\n",
"Accurcay : 0.8739680043547948\n",
"Recall : 0.9014122841503613\n",
"Precision: 0.8545094891347973\n",
"--Test--\n",
"Confusion Matrix \n",
" [[11904 2246]\n",
" [ 490 2816]]\n",
"Accurcay : 0.843263061411549\n",
"Recall : 0.8517846339987901\n",
"Precision: 0.5563018569735283\n"
]
}
],
"source": [
"print(\"--Train--\")\n",
"evaluate_model(y_train_sm, train_pred)\n",
"print(\"--Test--\")\n",
"evaluate_model(y_test, test_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ij5BcRlaxhDH"
},
"source": [
"## Building RandomForestClassifier Model using Variable Importance"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8Hj1pehWxhDH",
"outputId": "e92afb68-f30b-4524-ea1d-0d63ab866d9d"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[ 4 0 3 5 6 7 2 14 9 8 10 12 13 1 11 18 16 19 15 20 17]\n",
"[4 0 3 5 6]\n"
]
}
],
"source": [
"importances = clf_grid.best_estimator_.feature_importances_\n",
"indices = np.argsort(importances)[::-1]\n",
"print(indices)\n",
"\n",
"select = indices[0:5]\n",
"print(select)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lBi7hDTYxhDH"
},
"source": [
"### Instantiate Model"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"id": "HoGjzxNGxhDH"
},
"outputs": [],
"source": [
"clf4 = RandomForestClassifier(max_depth=5, max_features=5,\n",
" min_samples_leaf=2, n_estimators=100)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1SJV2UhixhDH"
},
"source": [
"### Train the model"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jfjTecn_xhDH",
"outputId": "b8bf991b-3cb0-4a39-8281-f28bb1f8c5a7"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"RandomForestClassifier(max_depth=5, max_features=5, min_samples_leaf=2)"
]
},
"metadata": {},
"execution_count": 64
}
],
"source": [
"clf4.fit(X_train_sm[:,select], y_train_sm)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "18_3itMsxhDH"
},
"source": [
"### Predict"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"id": "DjrfRj8NxhDH"
},
"outputs": [],
"source": [
"train_pred = clf4.predict(X_train_sm[:,select])\n",
"test_pred = clf4.predict(X_test_con[:,select])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GT1z8vOdxhDH"
},
"source": [
"### Evaluate"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0t3vu_GJxhDH",
"outputId": "6fce2e69-5f05-419e-a321-287c2c388d0d"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--Train--\n",
"Confusion Matrix \n",
" [[29175 3892]\n",
" [ 3861 29206]]\n",
"Accurcay : 0.8827683188677533\n",
"Recall : 0.8832370641424986\n",
"Precision: 0.8824098132817693\n",
"--Test--\n",
"Confusion Matrix \n",
" [[12453 1697]\n",
" [ 600 2706]]\n",
"Accurcay : 0.8684120073327223\n",
"Recall : 0.8185117967332124\n",
"Precision: 0.614580967522144\n"
]
}
],
"source": [
"print(\"--Train--\")\n",
"evaluate_model(y_train_sm, train_pred)\n",
"print(\"--Test--\")\n",
"evaluate_model(y_test, test_pred)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "165px"
},
"toc_section_display": true,
"toc_window_display": false
},
"colab": {
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment