muellerzr/debuggingtabularissue.ipynb

## debuggingtabularissue.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "DebuggingTabularIssue.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPLPsJRdI6vngHgslSY2/K4",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/muellerzr/c126e75b0265f88c3baa1ea50a4dfe6b/debuggingtabularissue.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OFjBvIDJGjBz"
      },
      "source": [
        "First install the dev versions:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uGmiTzBYwiDb",
        "outputId": "2b0dae90-687f-4d69-8277-80e1b2bb94d8",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "source": [
        "!pip install git+https://github.com/fastai/fastai -qqq\n",
        "!pip install git+https://github.com/fastai/fastcore -qqq"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "  Building wheel for fastai (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Building wheel for fastcore (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T2TgGzKAGmtz"
      },
      "source": [
        "Next we'll import the library:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "g7G6d684mM1x"
      },
      "source": [
        "from fastai.tabular.all import *"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8XxG8jkTnN7s"
      },
      "source": [
        "We will download the `ADULT_SAMPLE` dataset and load it into `Pandas`:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RoyiwS9CnK-l"
      },
      "source": [
        "path = untar_data(URLs.ADULT_SAMPLE)\n",
        "df = pd.read_csv(path/'adult.csv')"
      ],
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "grxUf3JHvB1f"
      },
      "source": [
        "cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']\n",
        "cont_names = ['age', 'fnlwgt', 'education-num']\n",
        "procs = [Categorify, FillMissing, Normalize]\n",
        "y_names = 'salary'\n",
        "y_block = CategoryBlock()"
      ],
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tWbwa_8iy8Sa"
      },
      "source": [
        "splits = RandomSplitter()(range_of(df))"
      ],
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "f5MHZdbfnZKp"
      },
      "source": [
        "to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,\n",
        "                   y_names=y_names, y_block=y_block, splits=splits)"
      ],
      "execution_count": 6,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WqxzBYEHttNx"
      },
      "source": [
        "dls = to.dataloaders(bs=200)\n",
        "learn = tabular_learner(dls, layers=[200,100])"
      ],
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ARODGoeXGtgl"
      },
      "source": [
        "Next we'll export the learner:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IVckiWKat2KT"
      },
      "source": [
        "learn.export(\"testing\")"
      ],
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D208e_aHGz6l"
      },
      "source": [
        "And force a reboot:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EiDMTZGw0fN0"
      },
      "source": [
        "exit()"
      ],
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6jBsmTp6HJTF"
      },
      "source": [
        "Next let's import fastai and our `muppy` helper. \n",
        "\n",
        "Now **at this point** we have zero references to a `DataFrame`, we can verify with `muppy`:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YZsM6_J2HSdY"
      },
      "source": [
        "from fastai.tabular.all import *\n",
        "from pympler import muppy"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RR_2KPirHQzp",
        "outputId": "d1826b04-9e35-40d3-982b-f329268a748a",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "source": [
        "all_objects = muppy.get_objects()\n",
        "my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n",
        "len(my_types)"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n",
            "  warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n"
          ],
          "name": "stderr"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "goP3S3DAHT2O"
      },
      "source": [
        "Let's try loading in our learner and see what happens:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Eaxl0nTyEMg7"
      },
      "source": [
        "learn = load_learner('testing')"
      ],
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1ECTynCc-0fG",
        "outputId": "1d734456-b3a7-40f2-e983-74b8b9d07015",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "source": [
        "all_objects = muppy.get_objects()\n",
        "my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n",
        "len(my_types)"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n",
            "  warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n"
          ],
          "name": "stderr"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "5"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MbBXuu5QHY2P"
      },
      "source": [
        "Suddenly we have five! In actuality we should only have really 2, our `train` (blank) and our `valid` (blank). We can also look at their values:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dD6uzTkuEg_j",
        "outputId": "aa1235fc-f0e4-4e84-d71b-84f997b2af81",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 101
        }
      },
      "source": [
        "my_types[0].head()"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>age</th>\n",
              "      <th>workclass</th>\n",
              "      <th>fnlwgt</th>\n",
              "      <th>education</th>\n",
              "      <th>education-num</th>\n",
              "      <th>marital-status</th>\n",
              "      <th>occupation</th>\n",
              "      <th>relationship</th>\n",
              "      <th>race</th>\n",
              "      <th>sex</th>\n",
              "      <th>capital-gain</th>\n",
              "      <th>capital-loss</th>\n",
              "      <th>hours-per-week</th>\n",
              "      <th>native-country</th>\n",
              "      <th>salary</th>\n",
              "      <th>education-num_na</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dMsUWGprHktJ",
        "outputId": "a39e4861-fc5d-431e-c090-cda97d5e356c",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 101
        }
      },
      "source": [
        "my_types[1].head()"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>age</th>\n",
              "      <th>workclass</th>\n",
              "      <th>fnlwgt</th>\n",
              "      <th>education</th>\n",
              "      <th>education-num</th>\n",
              "      <th>marital-status</th>\n",
              "      <th>occupation</th>\n",
              "      <th>relationship</th>\n",
              "      <th>race</th>\n",
              "      <th>sex</th>\n",
              "      <th>capital-gain</th>\n",
              "      <th>capital-loss</th>\n",
              "      <th>hours-per-week</th>\n",
              "      <th>native-country</th>\n",
              "      <th>salary</th>\n",
              "      <th>education-num_na</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UnPVbKJcHmAd",
        "outputId": "31e132bb-c19a-4dee-9b0e-ab2b939f194e",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 299
        }
      },
      "source": [
        "my_types[2].head()"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>age</th>\n",
              "      <th>workclass</th>\n",
              "      <th>fnlwgt</th>\n",
              "      <th>education</th>\n",
              "      <th>education-num</th>\n",
              "      <th>marital-status</th>\n",
              "      <th>occupation</th>\n",
              "      <th>relationship</th>\n",
              "      <th>race</th>\n",
              "      <th>sex</th>\n",
              "      <th>capital-gain</th>\n",
              "      <th>capital-loss</th>\n",
              "      <th>hours-per-week</th>\n",
              "      <th>native-country</th>\n",
              "      <th>salary</th>\n",
              "      <th>education-num_na</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>19109</th>\n",
              "      <td>-0.626635</td>\n",
              "      <td>5</td>\n",
              "      <td>1.069460</td>\n",
              "      <td>12</td>\n",
              "      <td>-0.424470</td>\n",
              "      <td>5</td>\n",
              "      <td>8</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>40</td>\n",
              "      <td>United-States</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22624</th>\n",
              "      <td>-1.434871</td>\n",
              "      <td>1</td>\n",
              "      <td>-1.257606</td>\n",
              "      <td>16</td>\n",
              "      <td>-0.030491</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>4</td>\n",
              "      <td>5</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>30</td>\n",
              "      <td>Japan</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>29269</th>\n",
              "      <td>0.034649</td>\n",
              "      <td>5</td>\n",
              "      <td>-0.333725</td>\n",
              "      <td>10</td>\n",
              "      <td>1.151445</td>\n",
              "      <td>3</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>45</td>\n",
              "      <td>United-States</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16316</th>\n",
              "      <td>0.181601</td>\n",
              "      <td>5</td>\n",
              "      <td>-0.696871</td>\n",
              "      <td>10</td>\n",
              "      <td>1.151445</td>\n",
              "      <td>3</td>\n",
              "      <td>7</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>40</td>\n",
              "      <td>Germany</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21530</th>\n",
              "      <td>-0.112303</td>\n",
              "      <td>3</td>\n",
              "      <td>-0.586142</td>\n",
              "      <td>16</td>\n",
              "      <td>-0.030491</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>40</td>\n",
              "      <td>United-States</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "            age  workclass    fnlwgt  ...  native-country  salary  education-num_na\n",
              "19109 -0.626635          5  1.069460  ...   United-States       0                 1\n",
              "22624 -1.434871          1 -1.257606  ...           Japan       0                 1\n",
              "29269  0.034649          5 -0.333725  ...   United-States       1                 1\n",
              "16316  0.181601          5 -0.696871  ...         Germany       0                 1\n",
              "21530 -0.112303          3 -0.586142  ...   United-States       0                 1\n",
              "\n",
              "[5 rows x 16 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QubxTc1CHp7h",
        "outputId": "bff79dc5-a652-4639-b8bd-033529ff0988",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 101
        }
      },
      "source": [
        "my_types[3].head()"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>age</th>\n",
              "      <th>workclass</th>\n",
              "      <th>fnlwgt</th>\n",
              "      <th>education</th>\n",
              "      <th>education-num</th>\n",
              "      <th>marital-status</th>\n",
              "      <th>occupation</th>\n",
              "      <th>relationship</th>\n",
              "      <th>race</th>\n",
              "      <th>sex</th>\n",
              "      <th>capital-gain</th>\n",
              "      <th>capital-loss</th>\n",
              "      <th>hours-per-week</th>\n",
              "      <th>native-country</th>\n",
              "      <th>salary</th>\n",
              "      <th>education-num_na</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 13
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0UXh4qucHr0j",
        "outputId": "fd210efc-65a1-4f28-ad83-b9e8888e6ccc",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 316
        }
      },
      "source": [
        "my_types[4].head()"
      ],
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>age</th>\n",
              "      <th>workclass</th>\n",
              "      <th>fnlwgt</th>\n",
              "      <th>education</th>\n",
              "      <th>education-num</th>\n",
              "      <th>marital-status</th>\n",
              "      <th>occupation</th>\n",
              "      <th>relationship</th>\n",
              "      <th>race</th>\n",
              "      <th>sex</th>\n",
              "      <th>capital-gain</th>\n",
              "      <th>capital-loss</th>\n",
              "      <th>hours-per-week</th>\n",
              "      <th>native-country</th>\n",
              "      <th>salary</th>\n",
              "      <th>education-num_na</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>24346</th>\n",
              "      <td>1.283741</td>\n",
              "      <td>7</td>\n",
              "      <td>-1.345207</td>\n",
              "      <td>16</td>\n",
              "      <td>-0.030491</td>\n",
              "      <td>3</td>\n",
              "      <td>5</td>\n",
              "      <td>6</td>\n",
              "      <td>5</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>1977</td>\n",
              "      <td>50</td>\n",
              "      <td>United-States</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12537</th>\n",
              "      <td>0.108125</td>\n",
              "      <td>8</td>\n",
              "      <td>-0.069612</td>\n",
              "      <td>13</td>\n",
              "      <td>1.545424</td>\n",
              "      <td>3</td>\n",
              "      <td>11</td>\n",
              "      <td>6</td>\n",
              "      <td>2</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>38</td>\n",
              "      <td>China</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5036</th>\n",
              "      <td>0.475505</td>\n",
              "      <td>5</td>\n",
              "      <td>-1.477885</td>\n",
              "      <td>10</td>\n",
              "      <td>1.151445</td>\n",
              "      <td>3</td>\n",
              "      <td>11</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>Male</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>50</td>\n",
              "      <td>United-States</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21718</th>\n",
              "      <td>-1.140967</td>\n",
              "      <td>5</td>\n",
              "      <td>-0.146010</td>\n",
              "      <td>10</td>\n",
              "      <td>1.151445</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>Female</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>40</td>\n",
              "      <td>United-States</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21744</th>\n",
              "      <td>-0.479683</td>\n",
              "      <td>5</td>\n",
              "      <td>0.307879</td>\n",
              "      <td>9</td>\n",
              "      <td>0.363488</td>\n",
              "      <td>3</td>\n",
              "      <td>11</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>Male</td>\n",
              "      <td>7298</td>\n",
              "      <td>0</td>\n",
              "      <td>42</td>\n",
              "      <td>United-States</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "            age  workclass    fnlwgt  ...  native-country  salary  education-num_na\n",
              "24346  1.283741          7 -1.345207  ...   United-States       1                 1\n",
              "12537  0.108125          8 -0.069612  ...           China       1                 1\n",
              "5036   0.475505          5 -1.477885  ...   United-States       1                 1\n",
              "21718 -1.140967          5 -0.146010  ...   United-States       0                 1\n",
              "21744 -0.479683          5  0.307879  ...   United-States       1                 1\n",
              "\n",
              "[5 rows x 16 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Hgc_7RoCHges"
      },
      "source": [
        "And we can see a copy of our train and validation dataframes, *not* what we want!\n",
        "\n",
        "The issue is I don't know where these originated from. When you try to investigate the size of the `DataLoader` with:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iWZd-gPyFYGE"
      },
      "source": [
        "from pympler import asizeof"
      ],
      "execution_count": 16,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "t9bX-9k4IAG6",
        "outputId": "c8c08547-ff79-403c-f884-63897106a9ae",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "source": [
        "print(asizeof.asized(learn, detail=1).format())"
      ],
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "<fastai.tabular.learner.TabularLearner object at 0x7f24cfcb8390> size=258032 flat=56\n",
            "    __dict__ size=257976 flat=1184\n",
            "    __class__ size=0 flat=0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TkboSoFfIDKF"
      },
      "source": [
        "You can see it's only 250,000 bytes. That doesn't add up to the ~2.1 MB our exported model is (that's only ~3%). Any help would be appreciated"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NxIOVIOkIyXn"
      },
      "source": [
        "> Note: If you decide to investigate `locals()` make sure to restart the runtime as `muppy` will show it's intermediate results"
      ]
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "DebuggingTabularIssue.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"authorship_tag": "ABX9TyPLPsJRdI6vngHgslSY2/K4",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"accelerator": "GPU"
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/muellerzr/c126e75b0265f88c3baa1ea50a4dfe6b/debuggingtabularissue.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "OFjBvIDJGjBz"
	},
	"source": [
	"First install the dev versions:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "uGmiTzBYwiDb",
	"outputId": "2b0dae90-687f-4d69-8277-80e1b2bb94d8",
	"colab": {
	"base_uri": "https://localhost:8080/"
	}
	},
	"source": [
	"!pip install git+https://github.com/fastai/fastai -qqq\n",
	"!pip install git+https://github.com/fastai/fastcore -qqq"
	],
	"execution_count": 1,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	" Building wheel for fastai (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
	" Building wheel for fastcore (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "T2TgGzKAGmtz"
	},
	"source": [
	"Next we'll import the library:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "g7G6d684mM1x"
	},
	"source": [
	"from fastai.tabular.all import *"
	],
	"execution_count": 2,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "8XxG8jkTnN7s"
	},
	"source": [
	"We will download the `ADULT_SAMPLE` dataset and load it into `Pandas`:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "RoyiwS9CnK-l"
	},
	"source": [
	"path = untar_data(URLs.ADULT_SAMPLE)\n",
	"df = pd.read_csv(path/'adult.csv')"
	],
	"execution_count": 3,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "grxUf3JHvB1f"
	},
	"source": [
	"cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']\n",
	"cont_names = ['age', 'fnlwgt', 'education-num']\n",
	"procs = [Categorify, FillMissing, Normalize]\n",
	"y_names = 'salary'\n",
	"y_block = CategoryBlock()"
	],
	"execution_count": 4,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "tWbwa_8iy8Sa"
	},
	"source": [
	"splits = RandomSplitter()(range_of(df))"
	],
	"execution_count": 5,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "f5MHZdbfnZKp"
	},
	"source": [
	"to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,\n",
	" y_names=y_names, y_block=y_block, splits=splits)"
	],
	"execution_count": 6,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "WqxzBYEHttNx"
	},
	"source": [
	"dls = to.dataloaders(bs=200)\n",
	"learn = tabular_learner(dls, layers=[200,100])"
	],
	"execution_count": 7,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "ARODGoeXGtgl"
	},
	"source": [
	"Next we'll export the learner:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "IVckiWKat2KT"
	},
	"source": [
	"learn.export(\"testing\")"
	],
	"execution_count": 8,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "D208e_aHGz6l"
	},
	"source": [
	"And force a reboot:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "EiDMTZGw0fN0"
	},
	"source": [
	"exit()"
	],
	"execution_count": 9,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "6jBsmTp6HJTF"
	},
	"source": [
	"Next let's import fastai and our `muppy` helper. \n",
	"\n",
	"Now at this point we have zero references to a `DataFrame`, we can verify with `muppy`:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "YZsM6_J2HSdY"
	},
	"source": [
	"from fastai.tabular.all import *\n",
	"from pympler import muppy"
	],
	"execution_count": 2,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "RR_2KPirHQzp",
	"outputId": "d1826b04-9e35-40d3-982b-f329268a748a",
	"colab": {
	"base_uri": "https://localhost:8080/"
	}
	},
	"source": [
	"all_objects = muppy.get_objects()\n",
	"my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n",
	"len(my_types)"
	],
	"execution_count": 3,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n",
	" warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n"
	],
	"name": "stderr"
	},
	{
	"output_type": "execute_result",
	"data": {
	"text/plain": [
	"0"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 3
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "goP3S3DAHT2O"
	},
	"source": [
	"Let's try loading in our learner and see what happens:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Eaxl0nTyEMg7"
	},
	"source": [
	"learn = load_learner('testing')"
	],
	"execution_count": 4,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "1ECTynCc-0fG",
	"outputId": "1d734456-b3a7-40f2-e983-74b8b9d07015",
	"colab": {
	"base_uri": "https://localhost:8080/"
	}
	},
	"source": [
	"all_objects = muppy.get_objects()\n",
	"my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n",
	"len(my_types)"
	],
	"execution_count": 5,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n",
	" warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n"
	],
	"name": "stderr"
	},
	{
	"output_type": "execute_result",
	"data": {
	"text/plain": [
	"5"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 5
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "MbBXuu5QHY2P"
	},
	"source": [
	"Suddenly we have five! In actuality we should only have really 2, our `train` (blank) and our `valid` (blank). We can also look at their values:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "dD6uzTkuEg_j",
	"outputId": "aa1235fc-f0e4-4e84-d71b-84f997b2af81",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 101
	}
	},
	"source": [
	"my_types[0].head()"
	],
	"execution_count": 10,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>age</th>\n",
	" <th>workclass</th>\n",
	" <th>fnlwgt</th>\n",
	" <th>education</th>\n",
	" <th>education-num</th>\n",
	" <th>marital-status</th>\n",
	" <th>occupation</th>\n",
	" <th>relationship</th>\n",
	" <th>race</th>\n",
	" <th>sex</th>\n",
	" <th>capital-gain</th>\n",
	" <th>capital-loss</th>\n",
	" <th>hours-per-week</th>\n",
	" <th>native-country</th>\n",
	" <th>salary</th>\n",
	" <th>education-num_na</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	"Empty DataFrame\n",
	"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
	"Index: []"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 10
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "dMsUWGprHktJ",
	"outputId": "a39e4861-fc5d-431e-c090-cda97d5e356c",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 101
	}
	},
	"source": [
	"my_types[1].head()"
	],
	"execution_count": 11,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>age</th>\n",
	" <th>workclass</th>\n",
	" <th>fnlwgt</th>\n",
	" <th>education</th>\n",
	" <th>education-num</th>\n",
	" <th>marital-status</th>\n",
	" <th>occupation</th>\n",
	" <th>relationship</th>\n",
	" <th>race</th>\n",
	" <th>sex</th>\n",
	" <th>capital-gain</th>\n",
	" <th>capital-loss</th>\n",
	" <th>hours-per-week</th>\n",
	" <th>native-country</th>\n",
	" <th>salary</th>\n",
	" <th>education-num_na</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	"Empty DataFrame\n",
	"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
	"Index: []"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 11
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "UnPVbKJcHmAd",
	"outputId": "31e132bb-c19a-4dee-9b0e-ab2b939f194e",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 299
	}
	},
	"source": [
	"my_types[2].head()"
	],
	"execution_count": 12,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>age</th>\n",
	" <th>workclass</th>\n",
	" <th>fnlwgt</th>\n",
	" <th>education</th>\n",
	" <th>education-num</th>\n",
	" <th>marital-status</th>\n",
	" <th>occupation</th>\n",
	" <th>relationship</th>\n",
	" <th>race</th>\n",
	" <th>sex</th>\n",
	" <th>capital-gain</th>\n",
	" <th>capital-loss</th>\n",
	" <th>hours-per-week</th>\n",
	" <th>native-country</th>\n",
	" <th>salary</th>\n",
	" <th>education-num_na</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>19109</th>\n",
	" <td>-0.626635</td>\n",
	" <td>5</td>\n",
	" <td>1.069460</td>\n",
	" <td>12</td>\n",
	" <td>-0.424470</td>\n",
	" <td>5</td>\n",
	" <td>8</td>\n",
	" <td>2</td>\n",
	" <td>5</td>\n",
	" <td>Male</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>40</td>\n",
	" <td>United-States</td>\n",
	" <td>0</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>22624</th>\n",
	" <td>-1.434871</td>\n",
	" <td>1</td>\n",
	" <td>-1.257606</td>\n",
	" <td>16</td>\n",
	" <td>-0.030491</td>\n",
	" <td>5</td>\n",
	" <td>1</td>\n",
	" <td>4</td>\n",
	" <td>5</td>\n",
	" <td>Female</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>30</td>\n",
	" <td>Japan</td>\n",
	" <td>0</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>29269</th>\n",
	" <td>0.034649</td>\n",
	" <td>5</td>\n",
	" <td>-0.333725</td>\n",
	" <td>10</td>\n",
	" <td>1.151445</td>\n",
	" <td>3</td>\n",
	" <td>5</td>\n",
	" <td>1</td>\n",
	" <td>5</td>\n",
	" <td>Male</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>45</td>\n",
	" <td>United-States</td>\n",
	" <td>1</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>16316</th>\n",
	" <td>0.181601</td>\n",
	" <td>5</td>\n",
	" <td>-0.696871</td>\n",
	" <td>10</td>\n",
	" <td>1.151445</td>\n",
	" <td>3</td>\n",
	" <td>7</td>\n",
	" <td>1</td>\n",
	" <td>5</td>\n",
	" <td>Male</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>40</td>\n",
	" <td>Germany</td>\n",
	" <td>0</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>21530</th>\n",
	" <td>-0.112303</td>\n",
	" <td>3</td>\n",
	" <td>-0.586142</td>\n",
	" <td>16</td>\n",
	" <td>-0.030491</td>\n",
	" <td>1</td>\n",
	" <td>2</td>\n",
	" <td>5</td>\n",
	" <td>5</td>\n",
	" <td>Female</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>40</td>\n",
	" <td>United-States</td>\n",
	" <td>0</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" age workclass fnlwgt ... native-country salary education-num_na\n",
	"19109 -0.626635 5 1.069460 ... United-States 0 1\n",
	"22624 -1.434871 1 -1.257606 ... Japan 0 1\n",
	"29269 0.034649 5 -0.333725 ... United-States 1 1\n",
	"16316 0.181601 5 -0.696871 ... Germany 0 1\n",
	"21530 -0.112303 3 -0.586142 ... United-States 0 1\n",
	"\n",
	"[5 rows x 16 columns]"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 12
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "QubxTc1CHp7h",
	"outputId": "bff79dc5-a652-4639-b8bd-033529ff0988",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 101
	}
	},
	"source": [
	"my_types[3].head()"
	],
	"execution_count": 13,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>age</th>\n",
	" <th>workclass</th>\n",
	" <th>fnlwgt</th>\n",
	" <th>education</th>\n",
	" <th>education-num</th>\n",
	" <th>marital-status</th>\n",
	" <th>occupation</th>\n",
	" <th>relationship</th>\n",
	" <th>race</th>\n",
	" <th>sex</th>\n",
	" <th>capital-gain</th>\n",
	" <th>capital-loss</th>\n",
	" <th>hours-per-week</th>\n",
	" <th>native-country</th>\n",
	" <th>salary</th>\n",
	" <th>education-num_na</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	"Empty DataFrame\n",
	"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n",
	"Index: []"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 13
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "0UXh4qucHr0j",
	"outputId": "fd210efc-65a1-4f28-ad83-b9e8888e6ccc",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 316
	}
	},
	"source": [
	"my_types[4].head()"
	],
	"execution_count": 14,
	"outputs": [
	{
	"output_type": "execute_result",
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>age</th>\n",
	" <th>workclass</th>\n",
	" <th>fnlwgt</th>\n",
	" <th>education</th>\n",
	" <th>education-num</th>\n",
	" <th>marital-status</th>\n",
	" <th>occupation</th>\n",
	" <th>relationship</th>\n",
	" <th>race</th>\n",
	" <th>sex</th>\n",
	" <th>capital-gain</th>\n",
	" <th>capital-loss</th>\n",
	" <th>hours-per-week</th>\n",
	" <th>native-country</th>\n",
	" <th>salary</th>\n",
	" <th>education-num_na</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>24346</th>\n",
	" <td>1.283741</td>\n",
	" <td>7</td>\n",
	" <td>-1.345207</td>\n",
	" <td>16</td>\n",
	" <td>-0.030491</td>\n",
	" <td>3</td>\n",
	" <td>5</td>\n",
	" <td>6</td>\n",
	" <td>5</td>\n",
	" <td>Female</td>\n",
	" <td>0</td>\n",
	" <td>1977</td>\n",
	" <td>50</td>\n",
	" <td>United-States</td>\n",
	" <td>1</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>12537</th>\n",
	" <td>0.108125</td>\n",
	" <td>8</td>\n",
	" <td>-0.069612</td>\n",
	" <td>13</td>\n",
	" <td>1.545424</td>\n",
	" <td>3</td>\n",
	" <td>11</td>\n",
	" <td>6</td>\n",
	" <td>2</td>\n",
	" <td>Female</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>38</td>\n",
	" <td>China</td>\n",
	" <td>1</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>5036</th>\n",
	" <td>0.475505</td>\n",
	" <td>5</td>\n",
	" <td>-1.477885</td>\n",
	" <td>10</td>\n",
	" <td>1.151445</td>\n",
	" <td>3</td>\n",
	" <td>11</td>\n",
	" <td>1</td>\n",
	" <td>5</td>\n",
	" <td>Male</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>50</td>\n",
	" <td>United-States</td>\n",
	" <td>1</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>21718</th>\n",
	" <td>-1.140967</td>\n",
	" <td>5</td>\n",
	" <td>-0.146010</td>\n",
	" <td>10</td>\n",
	" <td>1.151445</td>\n",
	" <td>5</td>\n",
	" <td>5</td>\n",
	" <td>2</td>\n",
	" <td>5</td>\n",
	" <td>Female</td>\n",
	" <td>0</td>\n",
	" <td>0</td>\n",
	" <td>40</td>\n",
	" <td>United-States</td>\n",
	" <td>0</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>21744</th>\n",
	" <td>-0.479683</td>\n",
	" <td>5</td>\n",
	" <td>0.307879</td>\n",
	" <td>9</td>\n",
	" <td>0.363488</td>\n",
	" <td>3</td>\n",
	" <td>11</td>\n",
	" <td>1</td>\n",
	" <td>5</td>\n",
	" <td>Male</td>\n",
	" <td>7298</td>\n",
	" <td>0</td>\n",
	" <td>42</td>\n",
	" <td>United-States</td>\n",
	" <td>1</td>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" age workclass fnlwgt ... native-country salary education-num_na\n",
	"24346 1.283741 7 -1.345207 ... United-States 1 1\n",
	"12537 0.108125 8 -0.069612 ... China 1 1\n",
	"5036 0.475505 5 -1.477885 ... United-States 1 1\n",
	"21718 -1.140967 5 -0.146010 ... United-States 0 1\n",
	"21744 -0.479683 5 0.307879 ... United-States 1 1\n",
	"\n",
	"[5 rows x 16 columns]"
	]
	},
	"metadata": {
	"tags": []
	},
	"execution_count": 14
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Hgc_7RoCHges"
	},
	"source": [
	"And we can see a copy of our train and validation dataframes, not what we want!\n",
	"\n",
	"The issue is I don't know where these originated from. When you try to investigate the size of the `DataLoader` with:"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "iWZd-gPyFYGE"
	},
	"source": [
	"from pympler import asizeof"
	],
	"execution_count": 16,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "t9bX-9k4IAG6",
	"outputId": "c8c08547-ff79-403c-f884-63897106a9ae",
	"colab": {
	"base_uri": "https://localhost:8080/"
	}
	},
	"source": [
	"print(asizeof.asized(learn, detail=1).format())"
	],
	"execution_count": 17,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"<fastai.tabular.learner.TabularLearner object at 0x7f24cfcb8390> size=258032 flat=56\n",
	" __dict__ size=257976 flat=1184\n",
	" __class__ size=0 flat=0\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "TkboSoFfIDKF"
	},
	"source": [
	"You can see it's only 250,000 bytes. That doesn't add up to the ~2.1 MB our exported model is (that's only ~3%). Any help would be appreciated"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "NxIOVIOkIyXn"
	},
	"source": [
	"> Note: If you decide to investigate `locals()` make sure to restart the runtime as `muppy` will show it's intermediate results"
	]
	}
	]
	}