Created
November 7, 2020 05:18
-
-
Save muellerzr/c126e75b0265f88c3baa1ea50a4dfe6b to your computer and use it in GitHub Desktop.
DebuggingTabularIssue.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "DebuggingTabularIssue.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"authorship_tag": "ABX9TyPLPsJRdI6vngHgslSY2/K4", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"accelerator": "GPU" | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/muellerzr/c126e75b0265f88c3baa1ea50a4dfe6b/debuggingtabularissue.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "OFjBvIDJGjBz" | |
}, | |
"source": [ | |
"First install the dev versions:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "uGmiTzBYwiDb", | |
"outputId": "2b0dae90-687f-4d69-8277-80e1b2bb94d8", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"!pip install git+https://github.com/fastai/fastai -qqq\n", | |
"!pip install git+https://github.com/fastai/fastcore -qqq" | |
], | |
"execution_count": 1, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
" Building wheel for fastai (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Building wheel for fastcore (setup.py) ... \u001b[?25l\u001b[?25hdone\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "T2TgGzKAGmtz" | |
}, | |
"source": [ | |
"Next we'll import the library:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "g7G6d684mM1x" | |
}, | |
"source": [ | |
"from fastai.tabular.all import *" | |
], | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "8XxG8jkTnN7s" | |
}, | |
"source": [ | |
"We will download the `ADULT_SAMPLE` dataset and load it into `Pandas`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "RoyiwS9CnK-l" | |
}, | |
"source": [ | |
"path = untar_data(URLs.ADULT_SAMPLE)\n", | |
"df = pd.read_csv(path/'adult.csv')" | |
], | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "grxUf3JHvB1f" | |
}, | |
"source": [ | |
"cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']\n", | |
"cont_names = ['age', 'fnlwgt', 'education-num']\n", | |
"procs = [Categorify, FillMissing, Normalize]\n", | |
"y_names = 'salary'\n", | |
"y_block = CategoryBlock()" | |
], | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "tWbwa_8iy8Sa" | |
}, | |
"source": [ | |
"splits = RandomSplitter()(range_of(df))" | |
], | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "f5MHZdbfnZKp" | |
}, | |
"source": [ | |
"to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,\n", | |
" y_names=y_names, y_block=y_block, splits=splits)" | |
], | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "WqxzBYEHttNx" | |
}, | |
"source": [ | |
"dls = to.dataloaders(bs=200)\n", | |
"learn = tabular_learner(dls, layers=[200,100])" | |
], | |
"execution_count": 7, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ARODGoeXGtgl" | |
}, | |
"source": [ | |
"Next we'll export the learner:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "IVckiWKat2KT" | |
}, | |
"source": [ | |
"learn.export(\"testing\")" | |
], | |
"execution_count": 8, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "D208e_aHGz6l" | |
}, | |
"source": [ | |
"And force a reboot:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "EiDMTZGw0fN0" | |
}, | |
"source": [ | |
"exit()" | |
], | |
"execution_count": 9, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "6jBsmTp6HJTF" | |
}, | |
"source": [ | |
"Next let's import fastai and our `muppy` helper. \n", | |
"\n", | |
"Now **at this point** we have zero references to a `DataFrame`, we can verify with `muppy`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "YZsM6_J2HSdY" | |
}, | |
"source": [ | |
"from fastai.tabular.all import *\n", | |
"from pympler import muppy" | |
], | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "RR_2KPirHQzp", | |
"outputId": "d1826b04-9e35-40d3-982b-f329268a748a", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"all_objects = muppy.get_objects()\n", | |
"my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n", | |
"len(my_types)" | |
], | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n", | |
" warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n" | |
], | |
"name": "stderr" | |
}, | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 3 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "goP3S3DAHT2O" | |
}, | |
"source": [ | |
"Let's try loading in our learner and see what happens:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "Eaxl0nTyEMg7" | |
}, | |
"source": [ | |
"learn = load_learner('testing')" | |
], | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "1ECTynCc-0fG", | |
"outputId": "1d734456-b3a7-40f2-e983-74b8b9d07015", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"all_objects = muppy.get_objects()\n", | |
"my_types = muppy.filter(all_objects, Type=pd.DataFrame)\n", | |
"len(my_types)" | |
], | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead\n", | |
" warnings.warn(\"torch.distributed.reduce_op is deprecated, please use \"\n" | |
], | |
"name": "stderr" | |
}, | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"5" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 5 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "MbBXuu5QHY2P" | |
}, | |
"source": [ | |
"Suddenly we have five! In actuality we should only have really 2, our `train` (blank) and our `valid` (blank). We can also look at their values:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "dD6uzTkuEg_j", | |
"outputId": "aa1235fc-f0e4-4e84-d71b-84f997b2af81", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 101 | |
} | |
}, | |
"source": [ | |
"my_types[0].head()" | |
], | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>age</th>\n", | |
" <th>workclass</th>\n", | |
" <th>fnlwgt</th>\n", | |
" <th>education</th>\n", | |
" <th>education-num</th>\n", | |
" <th>marital-status</th>\n", | |
" <th>occupation</th>\n", | |
" <th>relationship</th>\n", | |
" <th>race</th>\n", | |
" <th>sex</th>\n", | |
" <th>capital-gain</th>\n", | |
" <th>capital-loss</th>\n", | |
" <th>hours-per-week</th>\n", | |
" <th>native-country</th>\n", | |
" <th>salary</th>\n", | |
" <th>education-num_na</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"Empty DataFrame\n", | |
"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n", | |
"Index: []" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 10 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "dMsUWGprHktJ", | |
"outputId": "a39e4861-fc5d-431e-c090-cda97d5e356c", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 101 | |
} | |
}, | |
"source": [ | |
"my_types[1].head()" | |
], | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>age</th>\n", | |
" <th>workclass</th>\n", | |
" <th>fnlwgt</th>\n", | |
" <th>education</th>\n", | |
" <th>education-num</th>\n", | |
" <th>marital-status</th>\n", | |
" <th>occupation</th>\n", | |
" <th>relationship</th>\n", | |
" <th>race</th>\n", | |
" <th>sex</th>\n", | |
" <th>capital-gain</th>\n", | |
" <th>capital-loss</th>\n", | |
" <th>hours-per-week</th>\n", | |
" <th>native-country</th>\n", | |
" <th>salary</th>\n", | |
" <th>education-num_na</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"Empty DataFrame\n", | |
"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n", | |
"Index: []" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 11 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "UnPVbKJcHmAd", | |
"outputId": "31e132bb-c19a-4dee-9b0e-ab2b939f194e", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 299 | |
} | |
}, | |
"source": [ | |
"my_types[2].head()" | |
], | |
"execution_count": 12, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>age</th>\n", | |
" <th>workclass</th>\n", | |
" <th>fnlwgt</th>\n", | |
" <th>education</th>\n", | |
" <th>education-num</th>\n", | |
" <th>marital-status</th>\n", | |
" <th>occupation</th>\n", | |
" <th>relationship</th>\n", | |
" <th>race</th>\n", | |
" <th>sex</th>\n", | |
" <th>capital-gain</th>\n", | |
" <th>capital-loss</th>\n", | |
" <th>hours-per-week</th>\n", | |
" <th>native-country</th>\n", | |
" <th>salary</th>\n", | |
" <th>education-num_na</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>19109</th>\n", | |
" <td>-0.626635</td>\n", | |
" <td>5</td>\n", | |
" <td>1.069460</td>\n", | |
" <td>12</td>\n", | |
" <td>-0.424470</td>\n", | |
" <td>5</td>\n", | |
" <td>8</td>\n", | |
" <td>2</td>\n", | |
" <td>5</td>\n", | |
" <td>Male</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>40</td>\n", | |
" <td>United-States</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22624</th>\n", | |
" <td>-1.434871</td>\n", | |
" <td>1</td>\n", | |
" <td>-1.257606</td>\n", | |
" <td>16</td>\n", | |
" <td>-0.030491</td>\n", | |
" <td>5</td>\n", | |
" <td>1</td>\n", | |
" <td>4</td>\n", | |
" <td>5</td>\n", | |
" <td>Female</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>30</td>\n", | |
" <td>Japan</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29269</th>\n", | |
" <td>0.034649</td>\n", | |
" <td>5</td>\n", | |
" <td>-0.333725</td>\n", | |
" <td>10</td>\n", | |
" <td>1.151445</td>\n", | |
" <td>3</td>\n", | |
" <td>5</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>Male</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>45</td>\n", | |
" <td>United-States</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16316</th>\n", | |
" <td>0.181601</td>\n", | |
" <td>5</td>\n", | |
" <td>-0.696871</td>\n", | |
" <td>10</td>\n", | |
" <td>1.151445</td>\n", | |
" <td>3</td>\n", | |
" <td>7</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>Male</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>40</td>\n", | |
" <td>Germany</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21530</th>\n", | |
" <td>-0.112303</td>\n", | |
" <td>3</td>\n", | |
" <td>-0.586142</td>\n", | |
" <td>16</td>\n", | |
" <td>-0.030491</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>5</td>\n", | |
" <td>5</td>\n", | |
" <td>Female</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>40</td>\n", | |
" <td>United-States</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" age workclass fnlwgt ... native-country salary education-num_na\n", | |
"19109 -0.626635 5 1.069460 ... United-States 0 1\n", | |
"22624 -1.434871 1 -1.257606 ... Japan 0 1\n", | |
"29269 0.034649 5 -0.333725 ... United-States 1 1\n", | |
"16316 0.181601 5 -0.696871 ... Germany 0 1\n", | |
"21530 -0.112303 3 -0.586142 ... United-States 0 1\n", | |
"\n", | |
"[5 rows x 16 columns]" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 12 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "QubxTc1CHp7h", | |
"outputId": "bff79dc5-a652-4639-b8bd-033529ff0988", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 101 | |
} | |
}, | |
"source": [ | |
"my_types[3].head()" | |
], | |
"execution_count": 13, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>age</th>\n", | |
" <th>workclass</th>\n", | |
" <th>fnlwgt</th>\n", | |
" <th>education</th>\n", | |
" <th>education-num</th>\n", | |
" <th>marital-status</th>\n", | |
" <th>occupation</th>\n", | |
" <th>relationship</th>\n", | |
" <th>race</th>\n", | |
" <th>sex</th>\n", | |
" <th>capital-gain</th>\n", | |
" <th>capital-loss</th>\n", | |
" <th>hours-per-week</th>\n", | |
" <th>native-country</th>\n", | |
" <th>salary</th>\n", | |
" <th>education-num_na</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"Empty DataFrame\n", | |
"Columns: [age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, salary, education-num_na]\n", | |
"Index: []" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 13 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "0UXh4qucHr0j", | |
"outputId": "fd210efc-65a1-4f28-ad83-b9e8888e6ccc", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 316 | |
} | |
}, | |
"source": [ | |
"my_types[4].head()" | |
], | |
"execution_count": 14, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>age</th>\n", | |
" <th>workclass</th>\n", | |
" <th>fnlwgt</th>\n", | |
" <th>education</th>\n", | |
" <th>education-num</th>\n", | |
" <th>marital-status</th>\n", | |
" <th>occupation</th>\n", | |
" <th>relationship</th>\n", | |
" <th>race</th>\n", | |
" <th>sex</th>\n", | |
" <th>capital-gain</th>\n", | |
" <th>capital-loss</th>\n", | |
" <th>hours-per-week</th>\n", | |
" <th>native-country</th>\n", | |
" <th>salary</th>\n", | |
" <th>education-num_na</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>24346</th>\n", | |
" <td>1.283741</td>\n", | |
" <td>7</td>\n", | |
" <td>-1.345207</td>\n", | |
" <td>16</td>\n", | |
" <td>-0.030491</td>\n", | |
" <td>3</td>\n", | |
" <td>5</td>\n", | |
" <td>6</td>\n", | |
" <td>5</td>\n", | |
" <td>Female</td>\n", | |
" <td>0</td>\n", | |
" <td>1977</td>\n", | |
" <td>50</td>\n", | |
" <td>United-States</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12537</th>\n", | |
" <td>0.108125</td>\n", | |
" <td>8</td>\n", | |
" <td>-0.069612</td>\n", | |
" <td>13</td>\n", | |
" <td>1.545424</td>\n", | |
" <td>3</td>\n", | |
" <td>11</td>\n", | |
" <td>6</td>\n", | |
" <td>2</td>\n", | |
" <td>Female</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>38</td>\n", | |
" <td>China</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5036</th>\n", | |
" <td>0.475505</td>\n", | |
" <td>5</td>\n", | |
" <td>-1.477885</td>\n", | |
" <td>10</td>\n", | |
" <td>1.151445</td>\n", | |
" <td>3</td>\n", | |
" <td>11</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>Male</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>50</td>\n", | |
" <td>United-States</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21718</th>\n", | |
" <td>-1.140967</td>\n", | |
" <td>5</td>\n", | |
" <td>-0.146010</td>\n", | |
" <td>10</td>\n", | |
" <td>1.151445</td>\n", | |
" <td>5</td>\n", | |
" <td>5</td>\n", | |
" <td>2</td>\n", | |
" <td>5</td>\n", | |
" <td>Female</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>40</td>\n", | |
" <td>United-States</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21744</th>\n", | |
" <td>-0.479683</td>\n", | |
" <td>5</td>\n", | |
" <td>0.307879</td>\n", | |
" <td>9</td>\n", | |
" <td>0.363488</td>\n", | |
" <td>3</td>\n", | |
" <td>11</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>Male</td>\n", | |
" <td>7298</td>\n", | |
" <td>0</td>\n", | |
" <td>42</td>\n", | |
" <td>United-States</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" age workclass fnlwgt ... native-country salary education-num_na\n", | |
"24346 1.283741 7 -1.345207 ... United-States 1 1\n", | |
"12537 0.108125 8 -0.069612 ... China 1 1\n", | |
"5036 0.475505 5 -1.477885 ... United-States 1 1\n", | |
"21718 -1.140967 5 -0.146010 ... United-States 0 1\n", | |
"21744 -0.479683 5 0.307879 ... United-States 1 1\n", | |
"\n", | |
"[5 rows x 16 columns]" | |
] | |
}, | |
"metadata": { | |
"tags": [] | |
}, | |
"execution_count": 14 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Hgc_7RoCHges" | |
}, | |
"source": [ | |
"And we can see a copy of our train and validation dataframes, *not* what we want!\n", | |
"\n", | |
"The issue is I don't know where these originated from. When you try to investigate the size of the `DataLoader` with:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "iWZd-gPyFYGE" | |
}, | |
"source": [ | |
"from pympler import asizeof" | |
], | |
"execution_count": 16, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "t9bX-9k4IAG6", | |
"outputId": "c8c08547-ff79-403c-f884-63897106a9ae", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"print(asizeof.asized(learn, detail=1).format())" | |
], | |
"execution_count": 17, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": [ | |
"<fastai.tabular.learner.TabularLearner object at 0x7f24cfcb8390> size=258032 flat=56\n", | |
" __dict__ size=257976 flat=1184\n", | |
" __class__ size=0 flat=0\n" | |
], | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "TkboSoFfIDKF" | |
}, | |
"source": [ | |
"You can see it's only 250,000 bytes. That doesn't add up to the ~2.1 MB our exported model is (that's only ~3%). Any help would be appreciated" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "NxIOVIOkIyXn" | |
}, | |
"source": [ | |
"> Note: If you decide to investigate `locals()` make sure to restart the runtime as `muppy` will show it's intermediate results" | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment