Skip to content

Instantly share code, notes, and snippets.

@MaxHalford
Created March 5, 2023 21:59
Show Gist options
  • Save MaxHalford/a92f70400d1baeee055de3749071e954 to your computer and use it in GitHub Desktop.
Save MaxHalford/a92f70400d1baeee055de3749071e954 to your computer and use it in GitHub Desktop.
GoDaddy Microbusiness Density Forecasting Competition
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Solution"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This is my solution to [this](https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/) Kaggle competition. This is not a serious attempt at getting a high score. Instead, I developed a methodology for forecasting multiple steps ahead, using past predictions as features.\n",
"\n",
"Take any time series forecasting task. For the first step ahead, you can use the training data to build lagged features. But for the second step, you can't do that, because the ground truth isn't available. However, you could use the prediction obtained for the first step as a proxy. Then, for training, you need the model to be trained on the same data as for the first step, except that the previous value should be replaced by the out-of-fold prediction of the first step.\n",
"\n",
"This is quite tricky to get right. In this competiton, the length of each time series is the same. I thus stored the values for each step in columns. Every time I did predicted a step ahead, I replaced the past values accordingly. I'm not sure how to make this work by keeping the values stored in a single column. There's probably a tidy data trick to do this."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def smape(A, F):\n",
" return 100 / len(A) * np.sum(2 * np.abs(F - A) / (np.abs(A) + np.abs(F)))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data loading"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cfips</th>\n",
" <th>county</th>\n",
" <th>state</th>\n",
" <th>first_day_of_month</th>\n",
" <th>microbusiness_density</th>\n",
" <th>active</th>\n",
" </tr>\n",
" <tr>\n",
" <th>row_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1001_2019-08-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-08-01</td>\n",
" <td>3.007682</td>\n",
" <td>1249</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-09-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-09-01</td>\n",
" <td>2.884870</td>\n",
" <td>1198</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-10-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-10-01</td>\n",
" <td>3.055843</td>\n",
" <td>1269</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-11-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-11-01</td>\n",
" <td>2.993233</td>\n",
" <td>1243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-12-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-12-01</td>\n",
" <td>2.993233</td>\n",
" <td>1243</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cfips county state first_day_of_month \\\n",
"row_id \n",
"1001_2019-08-01 1001 Autauga County Alabama 2019-08-01 \n",
"1001_2019-09-01 1001 Autauga County Alabama 2019-09-01 \n",
"1001_2019-10-01 1001 Autauga County Alabama 2019-10-01 \n",
"1001_2019-11-01 1001 Autauga County Alabama 2019-11-01 \n",
"1001_2019-12-01 1001 Autauga County Alabama 2019-12-01 \n",
"\n",
" microbusiness_density active \n",
"row_id \n",
"1001_2019-08-01 3.007682 1249 \n",
"1001_2019-09-01 2.884870 1198 \n",
"1001_2019-10-01 3.055843 1269 \n",
"1001_2019-11-01 2.993233 1243 \n",
"1001_2019-12-01 2.993233 1243 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"train = pd.read_csv('data/train.csv', index_col='row_id')\n",
"train.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cfips</th>\n",
" <th>first_day_of_month</th>\n",
" <th>county</th>\n",
" <th>state</th>\n",
" </tr>\n",
" <tr>\n",
" <th>row_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1001_2022-11-01</th>\n",
" <td>1001</td>\n",
" <td>2022-11-01</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1003_2022-11-01</th>\n",
" <td>1003</td>\n",
" <td>2022-11-01</td>\n",
" <td>Baldwin County</td>\n",
" <td>Alabama</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1005_2022-11-01</th>\n",
" <td>1005</td>\n",
" <td>2022-11-01</td>\n",
" <td>Barbour County</td>\n",
" <td>Alabama</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1007_2022-11-01</th>\n",
" <td>1007</td>\n",
" <td>2022-11-01</td>\n",
" <td>Bibb County</td>\n",
" <td>Alabama</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1009_2022-11-01</th>\n",
" <td>1009</td>\n",
" <td>2022-11-01</td>\n",
" <td>Blount County</td>\n",
" <td>Alabama</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cfips first_day_of_month county state\n",
"row_id \n",
"1001_2022-11-01 1001 2022-11-01 Autauga County Alabama\n",
"1003_2022-11-01 1003 2022-11-01 Baldwin County Alabama\n",
"1005_2022-11-01 1005 2022-11-01 Barbour County Alabama\n",
"1007_2022-11-01 1007 2022-11-01 Bibb County Alabama\n",
"1009_2022-11-01 1009 2022-11-01 Blount County Alabama"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cfips_to_county = train.groupby('cfips')['county'].first()\n",
"cfips_to_state = train.groupby('cfips')['state'].first()\n",
"test = (\n",
" pd.read_csv('data/test.csv', index_col='row_id')\n",
" .assign(county=lambda df: df.cfips.map(cfips_to_county))\n",
" .assign(state=lambda df: df.cfips.map(cfips_to_state))\n",
")\n",
"test.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cfips</th>\n",
" <th>county</th>\n",
" <th>state</th>\n",
" <th>month</th>\n",
" <th>target</th>\n",
" <th>active</th>\n",
" <th>is_train</th>\n",
" <th>lat</th>\n",
" <th>lng</th>\n",
" </tr>\n",
" <tr>\n",
" <th>row_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1001_2019-08-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-08-01</td>\n",
" <td>3.007682</td>\n",
" <td>1249.0</td>\n",
" <td>True</td>\n",
" <td>32.535142</td>\n",
" <td>-86.6429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-09-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-09-01</td>\n",
" <td>2.884870</td>\n",
" <td>1198.0</td>\n",
" <td>True</td>\n",
" <td>32.535142</td>\n",
" <td>-86.6429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-10-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-10-01</td>\n",
" <td>3.055843</td>\n",
" <td>1269.0</td>\n",
" <td>True</td>\n",
" <td>32.535142</td>\n",
" <td>-86.6429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-11-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-11-01</td>\n",
" <td>2.993233</td>\n",
" <td>1243.0</td>\n",
" <td>True</td>\n",
" <td>32.535142</td>\n",
" <td>-86.6429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-12-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2019-12-01</td>\n",
" <td>2.993233</td>\n",
" <td>1243.0</td>\n",
" <td>True</td>\n",
" <td>32.535142</td>\n",
" <td>-86.6429</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cfips county state month target active \\\n",
"row_id \n",
"1001_2019-08-01 1001 Autauga County Alabama 2019-08-01 3.007682 1249.0 \n",
"1001_2019-09-01 1001 Autauga County Alabama 2019-09-01 2.884870 1198.0 \n",
"1001_2019-10-01 1001 Autauga County Alabama 2019-10-01 3.055843 1269.0 \n",
"1001_2019-11-01 1001 Autauga County Alabama 2019-11-01 2.993233 1243.0 \n",
"1001_2019-12-01 1001 Autauga County Alabama 2019-12-01 2.993233 1243.0 \n",
"\n",
" is_train lat lng \n",
"row_id \n",
"1001_2019-08-01 True 32.535142 -86.6429 \n",
"1001_2019-09-01 True 32.535142 -86.6429 \n",
"1001_2019-10-01 True 32.535142 -86.6429 \n",
"1001_2019-11-01 True 32.535142 -86.6429 \n",
"1001_2019-12-01 True 32.535142 -86.6429 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds = pd.concat([train, test])\n",
"ds['cfips'] = ds['cfips'].astype('category')\n",
"ds['county'] = ds['county'].astype('category')\n",
"ds['state'] = ds['state'].astype('category')\n",
"ds['is_train'] = ds.microbusiness_density.notnull()\n",
"ds = ds.rename(columns={'first_day_of_month': 'month', 'microbusiness_density': 'target'})\n",
"ds.month = pd.to_datetime(ds.month)\n",
"locations = pd.read_csv('data/cfips_location.csv', index_col='cfips')\n",
"ds = ds.join(locations[['lat', 'lng']], on='cfips')\n",
"ds.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"hovertemplate": "month=%{x}<br>target=%{y}<extra></extra>",
"legendgroup": "",
"line": {
"color": "#636efa",
"dash": "solid"
},
"marker": {
"symbol": "circle"
},
"mode": "lines",
"name": "",
"orientation": "v",
"showlegend": false,
"type": "scatter",
"x": [
"2019-08-01T00:00:00",
"2019-09-01T00:00:00",
"2019-10-01T00:00:00",
"2019-11-01T00:00:00",
"2019-12-01T00:00:00",
"2020-01-01T00:00:00",
"2020-02-01T00:00:00",
"2020-03-01T00:00:00",
"2020-04-01T00:00:00",
"2020-05-01T00:00:00",
"2020-06-01T00:00:00",
"2020-07-01T00:00:00",
"2020-08-01T00:00:00",
"2020-09-01T00:00:00",
"2020-10-01T00:00:00",
"2020-11-01T00:00:00",
"2020-12-01T00:00:00",
"2021-01-01T00:00:00",
"2021-02-01T00:00:00",
"2021-03-01T00:00:00",
"2021-04-01T00:00:00",
"2021-05-01T00:00:00",
"2021-06-01T00:00:00",
"2021-07-01T00:00:00",
"2021-08-01T00:00:00",
"2021-09-01T00:00:00",
"2021-10-01T00:00:00",
"2021-11-01T00:00:00",
"2021-12-01T00:00:00",
"2022-01-01T00:00:00",
"2022-02-01T00:00:00",
"2022-03-01T00:00:00",
"2022-04-01T00:00:00",
"2022-05-01T00:00:00",
"2022-06-01T00:00:00",
"2022-07-01T00:00:00",
"2022-08-01T00:00:00",
"2022-09-01T00:00:00",
"2022-10-01T00:00:00",
"2022-11-01T00:00:00",
"2022-12-01T00:00:00",
"2023-01-01T00:00:00",
"2023-02-01T00:00:00",
"2023-03-01T00:00:00",
"2023-04-01T00:00:00",
"2023-05-01T00:00:00",
"2023-06-01T00:00:00"
],
"xaxis": "x",
"y": [
12.555554,
12.50948,
12.535927,
12.550398,
12.491517,
12.376486,
12.067235,
12.139173,
12.192754,
12.171586,
12.184155,
12.261385,
12.288671,
12.253282,
12.22186,
12.155214,
12.149261,
11.946164,
11.066442,
11.124837,
11.142438,
11.099834,
10.978436,
10.986661,
10.988141,
10.952939,
10.972843,
11.387701,
11.44034,
11.462683,
11.434597,
11.529694,
11.532157,
11.43936,
11.544476,
11.664701,
11.622984,
11.615921,
11.625118,
null,
null,
null,
null,
null,
null,
null,
null
],
"yaxis": "y"
}
],
"layout": {
"legend": {
"tracegroupgap": 0
},
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
"text": "6081"
},
"xaxis": {
"anchor": "y",
"domain": [
0,
1
],
"title": {
"text": "month"
}
},
"yaxis": {
"anchor": "x",
"domain": [
0,
1
],
"title": {
"text": "target"
}
}
}
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"cfips = ds.query('is_train').cfips.sample().unique()[0]\n",
"#cfips = 32029\n",
"ds.query('cfips == @cfips').plot(x='month', y='target', title=str(cfips))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pct_bb_2017</th>\n",
" <th>pct_bb_2018</th>\n",
" <th>pct_bb_2019</th>\n",
" <th>pct_bb_2020</th>\n",
" <th>pct_bb_2021</th>\n",
" <th>pct_college_2017</th>\n",
" <th>pct_college_2018</th>\n",
" <th>pct_college_2019</th>\n",
" <th>pct_college_2020</th>\n",
" <th>pct_college_2021</th>\n",
" <th>pct_foreign_born_2017</th>\n",
" <th>pct_foreign_born_2018</th>\n",
" <th>pct_foreign_born_2019</th>\n",
" <th>pct_foreign_born_2020</th>\n",
" <th>pct_foreign_born_2021</th>\n",
" <th>pct_it_workers_2017</th>\n",
" <th>pct_it_workers_2018</th>\n",
" <th>pct_it_workers_2019</th>\n",
" <th>pct_it_workers_2020</th>\n",
" <th>pct_it_workers_2021</th>\n",
" <th>median_hh_inc_2017</th>\n",
" <th>median_hh_inc_2018</th>\n",
" <th>median_hh_inc_2019</th>\n",
" <th>median_hh_inc_2020</th>\n",
" <th>median_hh_inc_2021</th>\n",
" </tr>\n",
" <tr>\n",
" <th>cfips</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1001</th>\n",
" <td>76.6</td>\n",
" <td>78.9</td>\n",
" <td>80.6</td>\n",
" <td>82.7</td>\n",
" <td>85.5</td>\n",
" <td>14.5</td>\n",
" <td>15.9</td>\n",
" <td>16.1</td>\n",
" <td>16.7</td>\n",
" <td>16.4</td>\n",
" <td>2.1</td>\n",
" <td>2.0</td>\n",
" <td>2.3</td>\n",
" <td>2.3</td>\n",
" <td>2.1</td>\n",
" <td>1.3</td>\n",
" <td>1.1</td>\n",
" <td>0.7</td>\n",
" <td>0.6</td>\n",
" <td>1.1</td>\n",
" <td>55317</td>\n",
" <td>58786.0</td>\n",
" <td>58731</td>\n",
" <td>57982.0</td>\n",
" <td>62660.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1003</th>\n",
" <td>74.5</td>\n",
" <td>78.1</td>\n",
" <td>81.8</td>\n",
" <td>85.1</td>\n",
" <td>87.9</td>\n",
" <td>20.4</td>\n",
" <td>20.7</td>\n",
" <td>21.0</td>\n",
" <td>20.2</td>\n",
" <td>20.6</td>\n",
" <td>3.2</td>\n",
" <td>3.4</td>\n",
" <td>3.7</td>\n",
" <td>3.4</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>1.3</td>\n",
" <td>1.4</td>\n",
" <td>1.0</td>\n",
" <td>1.3</td>\n",
" <td>52562</td>\n",
" <td>55962.0</td>\n",
" <td>58320</td>\n",
" <td>61756.0</td>\n",
" <td>64346.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1005</th>\n",
" <td>57.2</td>\n",
" <td>60.4</td>\n",
" <td>60.5</td>\n",
" <td>64.6</td>\n",
" <td>64.6</td>\n",
" <td>7.6</td>\n",
" <td>7.8</td>\n",
" <td>7.6</td>\n",
" <td>7.3</td>\n",
" <td>6.7</td>\n",
" <td>2.7</td>\n",
" <td>2.5</td>\n",
" <td>2.7</td>\n",
" <td>2.6</td>\n",
" <td>2.6</td>\n",
" <td>0.5</td>\n",
" <td>0.3</td>\n",
" <td>0.8</td>\n",
" <td>1.1</td>\n",
" <td>0.8</td>\n",
" <td>33368</td>\n",
" <td>34186.0</td>\n",
" <td>32525</td>\n",
" <td>34990.0</td>\n",
" <td>36422.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1007</th>\n",
" <td>62.0</td>\n",
" <td>66.1</td>\n",
" <td>69.2</td>\n",
" <td>76.1</td>\n",
" <td>74.6</td>\n",
" <td>8.1</td>\n",
" <td>7.6</td>\n",
" <td>6.5</td>\n",
" <td>7.4</td>\n",
" <td>7.9</td>\n",
" <td>1.0</td>\n",
" <td>1.4</td>\n",
" <td>1.5</td>\n",
" <td>1.6</td>\n",
" <td>1.1</td>\n",
" <td>1.2</td>\n",
" <td>1.4</td>\n",
" <td>1.6</td>\n",
" <td>1.7</td>\n",
" <td>2.1</td>\n",
" <td>43404</td>\n",
" <td>45340.0</td>\n",
" <td>47542</td>\n",
" <td>51721.0</td>\n",
" <td>54277.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1009</th>\n",
" <td>65.8</td>\n",
" <td>68.5</td>\n",
" <td>73.0</td>\n",
" <td>79.6</td>\n",
" <td>81.0</td>\n",
" <td>8.7</td>\n",
" <td>8.1</td>\n",
" <td>8.6</td>\n",
" <td>8.9</td>\n",
" <td>9.3</td>\n",
" <td>4.5</td>\n",
" <td>4.4</td>\n",
" <td>4.5</td>\n",
" <td>4.4</td>\n",
" <td>4.5</td>\n",
" <td>1.3</td>\n",
" <td>1.4</td>\n",
" <td>0.9</td>\n",
" <td>1.1</td>\n",
" <td>0.9</td>\n",
" <td>47412</td>\n",
" <td>48695.0</td>\n",
" <td>49358</td>\n",
" <td>48922.0</td>\n",
" <td>52830.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56037</th>\n",
" <td>82.2</td>\n",
" <td>82.4</td>\n",
" <td>84.0</td>\n",
" <td>86.7</td>\n",
" <td>88.4</td>\n",
" <td>15.3</td>\n",
" <td>15.2</td>\n",
" <td>14.8</td>\n",
" <td>13.7</td>\n",
" <td>12.4</td>\n",
" <td>5.0</td>\n",
" <td>5.3</td>\n",
" <td>4.7</td>\n",
" <td>5.2</td>\n",
" <td>5.5</td>\n",
" <td>0.6</td>\n",
" <td>0.6</td>\n",
" <td>1.0</td>\n",
" <td>0.9</td>\n",
" <td>1.0</td>\n",
" <td>71083</td>\n",
" <td>73008.0</td>\n",
" <td>74843</td>\n",
" <td>73384.0</td>\n",
" <td>76668.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56039</th>\n",
" <td>83.5</td>\n",
" <td>85.9</td>\n",
" <td>87.1</td>\n",
" <td>89.1</td>\n",
" <td>90.5</td>\n",
" <td>37.7</td>\n",
" <td>37.8</td>\n",
" <td>38.9</td>\n",
" <td>37.2</td>\n",
" <td>38.3</td>\n",
" <td>10.8</td>\n",
" <td>11.2</td>\n",
" <td>11.8</td>\n",
" <td>11.4</td>\n",
" <td>11.1</td>\n",
" <td>0.7</td>\n",
" <td>1.2</td>\n",
" <td>1.4</td>\n",
" <td>1.5</td>\n",
" <td>2.0</td>\n",
" <td>80049</td>\n",
" <td>83831.0</td>\n",
" <td>84678</td>\n",
" <td>87053.0</td>\n",
" <td>94498.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56041</th>\n",
" <td>83.8</td>\n",
" <td>88.2</td>\n",
" <td>89.5</td>\n",
" <td>91.4</td>\n",
" <td>90.6</td>\n",
" <td>11.9</td>\n",
" <td>10.5</td>\n",
" <td>11.1</td>\n",
" <td>12.6</td>\n",
" <td>12.3</td>\n",
" <td>2.9</td>\n",
" <td>3.1</td>\n",
" <td>2.9</td>\n",
" <td>2.9</td>\n",
" <td>2.9</td>\n",
" <td>1.2</td>\n",
" <td>1.2</td>\n",
" <td>1.4</td>\n",
" <td>1.7</td>\n",
" <td>0.9</td>\n",
" <td>54672</td>\n",
" <td>58235.0</td>\n",
" <td>63403</td>\n",
" <td>72458.0</td>\n",
" <td>75106.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56043</th>\n",
" <td>76.4</td>\n",
" <td>78.3</td>\n",
" <td>78.2</td>\n",
" <td>82.8</td>\n",
" <td>85.4</td>\n",
" <td>15.4</td>\n",
" <td>15.0</td>\n",
" <td>15.4</td>\n",
" <td>15.0</td>\n",
" <td>17.2</td>\n",
" <td>2.3</td>\n",
" <td>1.4</td>\n",
" <td>1.6</td>\n",
" <td>2.2</td>\n",
" <td>1.0</td>\n",
" <td>1.3</td>\n",
" <td>1.0</td>\n",
" <td>0.9</td>\n",
" <td>0.9</td>\n",
" <td>1.1</td>\n",
" <td>51362</td>\n",
" <td>53426.0</td>\n",
" <td>54158</td>\n",
" <td>57306.0</td>\n",
" <td>62271.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56045</th>\n",
" <td>71.1</td>\n",
" <td>73.3</td>\n",
" <td>76.8</td>\n",
" <td>79.7</td>\n",
" <td>81.3</td>\n",
" <td>14.1</td>\n",
" <td>13.5</td>\n",
" <td>13.4</td>\n",
" <td>12.7</td>\n",
" <td>13.9</td>\n",
" <td>3.8</td>\n",
" <td>4.1</td>\n",
" <td>1.7</td>\n",
" <td>2.3</td>\n",
" <td>1.6</td>\n",
" <td>0.6</td>\n",
" <td>0.6</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>59605</td>\n",
" <td>52867.0</td>\n",
" <td>57031</td>\n",
" <td>53333.0</td>\n",
" <td>65566.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3142 rows × 25 columns</p>\n",
"</div>"
],
"text/plain": [
" pct_bb_2017 pct_bb_2018 pct_bb_2019 pct_bb_2020 pct_bb_2021 \\\n",
"cfips \n",
"1001 76.6 78.9 80.6 82.7 85.5 \n",
"1003 74.5 78.1 81.8 85.1 87.9 \n",
"1005 57.2 60.4 60.5 64.6 64.6 \n",
"1007 62.0 66.1 69.2 76.1 74.6 \n",
"1009 65.8 68.5 73.0 79.6 81.0 \n",
"... ... ... ... ... ... \n",
"56037 82.2 82.4 84.0 86.7 88.4 \n",
"56039 83.5 85.9 87.1 89.1 90.5 \n",
"56041 83.8 88.2 89.5 91.4 90.6 \n",
"56043 76.4 78.3 78.2 82.8 85.4 \n",
"56045 71.1 73.3 76.8 79.7 81.3 \n",
"\n",
" pct_college_2017 pct_college_2018 pct_college_2019 pct_college_2020 \\\n",
"cfips \n",
"1001 14.5 15.9 16.1 16.7 \n",
"1003 20.4 20.7 21.0 20.2 \n",
"1005 7.6 7.8 7.6 7.3 \n",
"1007 8.1 7.6 6.5 7.4 \n",
"1009 8.7 8.1 8.6 8.9 \n",
"... ... ... ... ... \n",
"56037 15.3 15.2 14.8 13.7 \n",
"56039 37.7 37.8 38.9 37.2 \n",
"56041 11.9 10.5 11.1 12.6 \n",
"56043 15.4 15.0 15.4 15.0 \n",
"56045 14.1 13.5 13.4 12.7 \n",
"\n",
" pct_college_2021 pct_foreign_born_2017 pct_foreign_born_2018 \\\n",
"cfips \n",
"1001 16.4 2.1 2.0 \n",
"1003 20.6 3.2 3.4 \n",
"1005 6.7 2.7 2.5 \n",
"1007 7.9 1.0 1.4 \n",
"1009 9.3 4.5 4.4 \n",
"... ... ... ... \n",
"56037 12.4 5.0 5.3 \n",
"56039 38.3 10.8 11.2 \n",
"56041 12.3 2.9 3.1 \n",
"56043 17.2 2.3 1.4 \n",
"56045 13.9 3.8 4.1 \n",
"\n",
" pct_foreign_born_2019 pct_foreign_born_2020 pct_foreign_born_2021 \\\n",
"cfips \n",
"1001 2.3 2.3 2.1 \n",
"1003 3.7 3.4 3.5 \n",
"1005 2.7 2.6 2.6 \n",
"1007 1.5 1.6 1.1 \n",
"1009 4.5 4.4 4.5 \n",
"... ... ... ... \n",
"56037 4.7 5.2 5.5 \n",
"56039 11.8 11.4 11.1 \n",
"56041 2.9 2.9 2.9 \n",
"56043 1.6 2.2 1.0 \n",
"56045 1.7 2.3 1.6 \n",
"\n",
" pct_it_workers_2017 pct_it_workers_2018 pct_it_workers_2019 \\\n",
"cfips \n",
"1001 1.3 1.1 0.7 \n",
"1003 1.4 1.3 1.4 \n",
"1005 0.5 0.3 0.8 \n",
"1007 1.2 1.4 1.6 \n",
"1009 1.3 1.4 0.9 \n",
"... ... ... ... \n",
"56037 0.6 0.6 1.0 \n",
"56039 0.7 1.2 1.4 \n",
"56041 1.2 1.2 1.4 \n",
"56043 1.3 1.0 0.9 \n",
"56045 0.6 0.6 0.0 \n",
"\n",
" pct_it_workers_2020 pct_it_workers_2021 median_hh_inc_2017 \\\n",
"cfips \n",
"1001 0.6 1.1 55317 \n",
"1003 1.0 1.3 52562 \n",
"1005 1.1 0.8 33368 \n",
"1007 1.7 2.1 43404 \n",
"1009 1.1 0.9 47412 \n",
"... ... ... ... \n",
"56037 0.9 1.0 71083 \n",
"56039 1.5 2.0 80049 \n",
"56041 1.7 0.9 54672 \n",
"56043 0.9 1.1 51362 \n",
"56045 0.0 0.0 59605 \n",
"\n",
" median_hh_inc_2018 median_hh_inc_2019 median_hh_inc_2020 \\\n",
"cfips \n",
"1001 58786.0 58731 57982.0 \n",
"1003 55962.0 58320 61756.0 \n",
"1005 34186.0 32525 34990.0 \n",
"1007 45340.0 47542 51721.0 \n",
"1009 48695.0 49358 48922.0 \n",
"... ... ... ... \n",
"56037 73008.0 74843 73384.0 \n",
"56039 83831.0 84678 87053.0 \n",
"56041 58235.0 63403 72458.0 \n",
"56043 53426.0 54158 57306.0 \n",
"56045 52867.0 57031 53333.0 \n",
"\n",
" median_hh_inc_2021 \n",
"cfips \n",
"1001 62660.0 \n",
"1003 64346.0 \n",
"1005 36422.0 \n",
"1007 54277.0 \n",
"1009 52830.0 \n",
"... ... \n",
"56037 76668.0 \n",
"56039 94498.0 \n",
"56041 75106.0 \n",
"56043 62271.0 \n",
"56045 65566.0 \n",
"\n",
"[3142 rows x 25 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv('data/census_starter.csv', index_col='cfips')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Target engineering"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/8q/xqx16rw14bl7rqyybl7zk8sh0000gn/T/ipykernel_99328/3922010533.py:7: RuntimeWarning:\n",
"\n",
"invalid value encountered in scalar subtract\n",
"\n"
]
}
],
"source": [
"for cfips in (ds['cfips'].unique()):\n",
" indices = ds['cfips'].eq(cfips)\n",
" val = ds.loc[indices, 'target'].values.copy()\n",
" \n",
" for i in range(37, 2, -1):\n",
" threshold = 0.2 * np.mean(val[:i])\n",
" difa = abs(val[i] - val[i - 1])\n",
" if difa >= threshold:\n",
" val[:i] *= val[i] / val[i - 1]\n",
" \n",
" val[0] = val[1] * 0.99\n",
" ds.loc[indices, 'target'] = val"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature engineering"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"for shift in range(1, 39):\n",
" ds[f'target-{shift}'] = ds.groupby('cfips')['target'].shift(shift)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"ds['first_target_value'] = ds.groupby('cfips')['target'].transform('first')"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cfips</th>\n",
" <th>county</th>\n",
" <th>state</th>\n",
" <th>target-1</th>\n",
" <th>target-2</th>\n",
" <th>target-3</th>\n",
" <th>target-4</th>\n",
" <th>target-5</th>\n",
" <th>target-6</th>\n",
" <th>target-7</th>\n",
" <th>target-8</th>\n",
" <th>target-9</th>\n",
" <th>target-10</th>\n",
" <th>target-11</th>\n",
" <th>target-12</th>\n",
" <th>target-13</th>\n",
" <th>target-14</th>\n",
" <th>target-15</th>\n",
" <th>target-16</th>\n",
" <th>target-17</th>\n",
" <th>target-18</th>\n",
" <th>target-19</th>\n",
" <th>target-20</th>\n",
" <th>target-21</th>\n",
" <th>target-22</th>\n",
" <th>target-23</th>\n",
" <th>first_target_value</th>\n",
" <th>target_rolling_3_mean</th>\n",
" <th>target_rolling_6_mean</th>\n",
" <th>target_rolling_9_mean</th>\n",
" <th>target_rolling_12_mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>row_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1001_2019-08-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.856021</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-09-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2.856021</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.856021</td>\n",
" <td>2.856021</td>\n",
" <td>2.856021</td>\n",
" <td>2.856021</td>\n",
" <td>2.856021</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-10-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2.884870</td>\n",
" <td>2.856021</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.856021</td>\n",
" <td>2.870446</td>\n",
" <td>2.870446</td>\n",
" <td>2.870446</td>\n",
" <td>2.870446</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-11-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>3.055843</td>\n",
" <td>2.884870</td>\n",
" <td>2.856021</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.856021</td>\n",
" <td>2.970357</td>\n",
" <td>2.932245</td>\n",
" <td>2.932245</td>\n",
" <td>2.932245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001_2019-12-01</th>\n",
" <td>1001</td>\n",
" <td>Autauga County</td>\n",
" <td>Alabama</td>\n",
" <td>2.993233</td>\n",
" <td>3.055843</td>\n",
" <td>2.884870</td>\n",
" <td>2.856021</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.856021</td>\n",
" <td>3.024538</td>\n",
" <td>2.947492</td>\n",
" <td>2.947492</td>\n",
" <td>2.947492</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cfips county state target-1 target-2 target-3 \\\n",
"row_id \n",
"1001_2019-08-01 1001 Autauga County Alabama NaN NaN NaN \n",
"1001_2019-09-01 1001 Autauga County Alabama 2.856021 NaN NaN \n",
"1001_2019-10-01 1001 Autauga County Alabama 2.884870 2.856021 NaN \n",
"1001_2019-11-01 1001 Autauga County Alabama 3.055843 2.884870 2.856021 \n",
"1001_2019-12-01 1001 Autauga County Alabama 2.993233 3.055843 2.884870 \n",
"\n",
" target-4 target-5 target-6 target-7 target-8 target-9 \\\n",
"row_id \n",
"1001_2019-08-01 NaN NaN NaN NaN NaN NaN \n",
"1001_2019-09-01 NaN NaN NaN NaN NaN NaN \n",
"1001_2019-10-01 NaN NaN NaN NaN NaN NaN \n",
"1001_2019-11-01 NaN NaN NaN NaN NaN NaN \n",
"1001_2019-12-01 2.856021 NaN NaN NaN NaN NaN \n",
"\n",
" target-10 target-11 target-12 target-13 target-14 \\\n",
"row_id \n",
"1001_2019-08-01 NaN NaN NaN NaN NaN \n",
"1001_2019-09-01 NaN NaN NaN NaN NaN \n",
"1001_2019-10-01 NaN NaN NaN NaN NaN \n",
"1001_2019-11-01 NaN NaN NaN NaN NaN \n",
"1001_2019-12-01 NaN NaN NaN NaN NaN \n",
"\n",
" target-15 target-16 target-17 target-18 target-19 \\\n",
"row_id \n",
"1001_2019-08-01 NaN NaN NaN NaN NaN \n",
"1001_2019-09-01 NaN NaN NaN NaN NaN \n",
"1001_2019-10-01 NaN NaN NaN NaN NaN \n",
"1001_2019-11-01 NaN NaN NaN NaN NaN \n",
"1001_2019-12-01 NaN NaN NaN NaN NaN \n",
"\n",
" target-20 target-21 target-22 target-23 \\\n",
"row_id \n",
"1001_2019-08-01 NaN NaN NaN NaN \n",
"1001_2019-09-01 NaN NaN NaN NaN \n",
"1001_2019-10-01 NaN NaN NaN NaN \n",
"1001_2019-11-01 NaN NaN NaN NaN \n",
"1001_2019-12-01 NaN NaN NaN NaN \n",
"\n",
" first_target_value target_rolling_3_mean \\\n",
"row_id \n",
"1001_2019-08-01 2.856021 NaN \n",
"1001_2019-09-01 2.856021 2.856021 \n",
"1001_2019-10-01 2.856021 2.870446 \n",
"1001_2019-11-01 2.856021 2.970357 \n",
"1001_2019-12-01 2.856021 3.024538 \n",
"\n",
" target_rolling_6_mean target_rolling_9_mean \\\n",
"row_id \n",
"1001_2019-08-01 NaN NaN \n",
"1001_2019-09-01 2.856021 2.856021 \n",
"1001_2019-10-01 2.870446 2.870446 \n",
"1001_2019-11-01 2.932245 2.932245 \n",
"1001_2019-12-01 2.947492 2.947492 \n",
"\n",
" target_rolling_12_mean \n",
"row_id \n",
"1001_2019-08-01 NaN \n",
"1001_2019-09-01 2.856021 \n",
"1001_2019-10-01 2.870446 \n",
"1001_2019-11-01 2.932245 \n",
"1001_2019-12-01 2.947492 "
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from scipy import stats\n",
"from sklearn import preprocessing\n",
"\n",
"not_features = ['target', 'month', 'active', 'is_train', 'lat', 'lng']\n",
"\n",
"def extract_features(X):\n",
" Z = X.copy()\n",
" for w in [3, 6, 9, 12]:\n",
" Z[f'target_rolling_{w}_mean'] = (\n",
" Z[[f'target-{k}' for k in range(1, w)]]\n",
" .mean(axis='columns')\n",
" )\n",
" Z = Z.drop(columns=(\n",
" not_features +\n",
" [f'target-{k}' for k in range(24, 39)]\n",
" ))\n",
" return Z\n",
"\n",
"extract_features(ds).head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Baseline"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.4888\n"
]
}
],
"source": [
"D = ds.copy()\n",
"oof = D[D.is_train & (D.month > '2019-08-01')]\n",
"print(smape(oof.target, oof['target-1']).round(4))\n",
"D[~D.is_train]['target-1'].rename('microbusiness_density').to_csv('baseline_submission.csv.zip', header=True)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"cfips\n",
"56037 4.0933\n",
"56039 7.5138\n",
"56041 3.7991\n",
"56043 3.7272\n",
"56045 3.1007\n",
"dtype: float64"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" D[D.is_train & (D.month > '2019-08-01')]\n",
" .groupby('cfips')\n",
" .apply(lambda df: smape(df.target, df['target-1']).round(4))\n",
" .tail(5)\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[620]\tfit's huber: 1.83369\tval's huber: 1.84429\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[186]\tfit's huber: 1.8359\tval's huber: 1.84012\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[215]\tfit's huber: 1.84221\tval's huber: 1.8274\n",
"t+1 - SMAPE: 1.6249\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[280]\tfit's huber: 1.83373\tval's huber: 1.84432\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[483]\tfit's huber: 1.83588\tval's huber: 1.84042\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[210]\tfit's huber: 1.84222\tval's huber: 1.82718\n",
"t+2 - SMAPE: 1.6964\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[266]\tfit's huber: 1.83374\tval's huber: 1.84421\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[344]\tfit's huber: 1.8359\tval's huber: 1.8404\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[189]\tfit's huber: 1.84223\tval's huber: 1.82705\n",
"t+3 - SMAPE: 1.7736\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[486]\tfit's huber: 1.83373\tval's huber: 1.84417\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[181]\tfit's huber: 1.83593\tval's huber: 1.84055\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[225]\tfit's huber: 1.84224\tval's huber: 1.827\n",
"t+4 - SMAPE: 1.8577\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[309]\tfit's huber: 1.83376\tval's huber: 1.84409\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[211]\tfit's huber: 1.83594\tval's huber: 1.84063\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[220]\tfit's huber: 1.84225\tval's huber: 1.82686\n",
"t+5 - SMAPE: 1.9731\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[386]\tfit's huber: 1.83376\tval's huber: 1.84419\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[201]\tfit's huber: 1.83596\tval's huber: 1.8409\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[209]\tfit's huber: 1.84227\tval's huber: 1.82693\n",
"t+6 - SMAPE: 2.1009\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[819]\tfit's huber: 1.83374\tval's huber: 1.84413\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[248]\tfit's huber: 1.83597\tval's huber: 1.84089\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[385]\tfit's huber: 1.84227\tval's huber: 1.82687\n",
"t+7 - SMAPE: 2.2476\n",
"\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[494]\tfit's huber: 1.83378\tval's huber: 1.84426\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[220]\tfit's huber: 1.83601\tval's huber: 1.84106\n",
"Training until validation scores don't improve for 50 rounds\n",
"Early stopping, best iteration is:\n",
"[176]\tfit's huber: 1.84232\tval's huber: 1.82694\n",
"t+8 - SMAPE: 2.4071\n",
"\n"
]
}
],
"source": [
"import warnings\n",
"import chime\n",
"import lightgbm as lgb\n",
"from sklearn import compose\n",
"from sklearn import ensemble\n",
"from sklearn import model_selection\n",
"\n",
"chime.theme('zelda')\n",
"\n",
"cut_off = pd.Timestamp(year=2022, month=10, day=1)\n",
"one_month = pd.tseries.offsets.DateOffset(months=1)\n",
"horizon = 8\n",
"\n",
"D = ds.copy()\n",
"is_train = (\n",
" D.is_train &\n",
" (D.month > pd.Timestamp(year=2019, month=8, day=1)) # skip first month\n",
" & D.target.notnull()\n",
" & ~D.target.eq(np.inf)\n",
")\n",
"all_oof = pd.DataFrame(index=D[is_train].index)\n",
"\n",
"model = lgb.LGBMRegressor(\n",
" n_estimators=1000,\n",
" verbosity=-1,\n",
" objective='huber',\n",
" random_state=42,\n",
" max_depth=12,\n",
" learning_rate=0.08,\n",
" min_child_samples=20\n",
")\n",
"model = compose.TransformedTargetRegressor(\n",
" regressor=model,\n",
" func=np.log1p,\n",
" inverse_func=np.expm1\n",
")\n",
"\n",
"cv = model_selection.KFold(n_splits=3, shuffle=True, random_state=42)\n",
"\n",
"perf = []\n",
"\n",
"for h in range(1, horizon + 1):\n",
"\n",
" # We want to predict h step(s) ahead\n",
" is_test = D.month == (cut_off + h * one_month)\n",
" is_test_next = D.month == (cut_off + (h + 1) * one_month)\n",
"\n",
" # We will store out-of-fold and test predictions within each round\n",
" oof = pd.Series(0.0, index=D[is_train].index)\n",
" predictions = pd.Series(0.0, index=D[is_test].index)\n",
"\n",
" # We do some feature engineering here to account for the fact the \n",
" # target is edited in the training set at each step\n",
" features = extract_features(D)\n",
" X_train = features[is_train]\n",
" y_train = D[is_train].target\n",
" X_test = features[is_test]\n",
"\n",
" # Cross-validated fit/predict\n",
" for fit_idx, val_idx in cv.split(X_train, y_train):\n",
" X_fit = X_train.iloc[fit_idx]\n",
" X_val = X_train.iloc[val_idx]\n",
" y_fit = y_train.iloc[fit_idx]\n",
" y_val = y_train.iloc[val_idx]\n",
"\n",
" with warnings.catch_warnings(category=UserWarning):\n",
" warnings.simplefilter('ignore')\n",
" model.fit(\n",
" X_fit, y_fit,\n",
" eval_set=[(X_fit, y_fit), (X_val, y_val)],\n",
" eval_names=('fit', 'val'),\n",
" categorical_feature=[\"state\"],\n",
" callbacks=[\n",
" lgb.early_stopping(50),\n",
" #lgb.print_evaluation(100)\n",
" ]\n",
" )\n",
" oof.iloc[val_idx] = model.predict(X_val)\n",
" predictions += model.predict(X_test) / cv.n_splits\n",
"\n",
" all_oof.loc[oof.index, h] = oof.values\n",
" D.loc[predictions.index, 'target'] = predictions.values\n",
" msg = f't+{h} - SMAPE: {smape(y_train, oof):.4f}'\n",
" perf.append(msg)\n",
" print(msg, end='\\n\\n')\n",
"\n",
" # Update the training and test sets for the next step ahead\n",
" for k in range(h, 1, -1):\n",
" D.loc[is_train, f'target-{k}'] = D[is_train].groupby('cfips')[f'target-{k - 1}'].shift(1)\n",
" if h < horizon:\n",
" D.loc[is_test_next, f'target-{k}'] = D[is_test][f'target-{k - 1}'].values\n",
" D.loc[is_train, 'target-1'] = oof.values\n",
" if h < horizon:\n",
" D.loc[is_test_next, 'target-1'] = predictions.values\n",
"\n",
"# Store the test predictions in the original dataset\n",
"ds.loc[~D.is_train, 'target'] = D.loc[~D.is_train, 'target'].values\n",
"\n",
"#chime.success()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"t+1 - SMAPE: 1.6249\n",
"t+2 - SMAPE: 1.6964\n",
"t+3 - SMAPE: 1.7736\n",
"t+4 - SMAPE: 1.8577\n",
"t+5 - SMAPE: 1.9731\n",
"t+6 - SMAPE: 2.1009\n",
"t+7 - SMAPE: 2.2476\n",
"t+8 - SMAPE: 2.4071\n"
]
}
],
"source": [
"print('\\n'.join(perf))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"t+1 - SMAPE: 1.6759\n",
"t+2 - SMAPE: 1.7387\n",
"t+3 - SMAPE: 1.8112\n",
"t+4 - SMAPE: 1.9043\n",
"t+5 - SMAPE: 2.0242\n",
"t+6 - SMAPE: 2.1616\n",
"t+7 - SMAPE: 2.3316\n",
"t+8 - SMAPE: 2.5400\n",
"\n",
"Public score: 1.2322"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Postprocessing"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"max_active = ds.groupby('cfips')['active'].max()\n",
"last_known_target_values = ds.query('is_train').groupby('cfips')['target'].last().to_dict()\n",
"\n",
"for cfips in max_active[max_active < 100].index:\n",
" ds.loc[ds.cfips.eq(cfips) & ~ds.is_train, 'target'] = last_known_target_values[cfips]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visual checks"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"hovertemplate": "variable=target<br>row_id=%{x}<br>value=%{y}<extra></extra>",
"legendgroup": "target",
"marker": {
"color": "#636efa",
"symbol": "circle"
},
"mode": "markers",
"name": "target",
"orientation": "v",
"showlegend": true,
"type": "scatter",
"x": [
"2019-08-01T00:00:00",
"2019-09-01T00:00:00",
"2019-10-01T00:00:00",
"2019-11-01T00:00:00",
"2019-12-01T00:00:00",
"2020-01-01T00:00:00",
"2020-02-01T00:00:00",
"2020-03-01T00:00:00",
"2020-04-01T00:00:00",
"2020-05-01T00:00:00",
"2020-06-01T00:00:00",
"2020-07-01T00:00:00",
"2020-08-01T00:00:00",
"2020-09-01T00:00:00",
"2020-10-01T00:00:00",
"2020-11-01T00:00:00",
"2020-12-01T00:00:00",
"2021-01-01T00:00:00",
"2021-02-01T00:00:00",
"2021-03-01T00:00:00",
"2021-04-01T00:00:00",
"2021-05-01T00:00:00",
"2021-06-01T00:00:00",
"2021-07-01T00:00:00",
"2021-08-01T00:00:00",
"2021-09-01T00:00:00",
"2021-10-01T00:00:00",
"2021-11-01T00:00:00",
"2021-12-01T00:00:00",
"2022-01-01T00:00:00",
"2022-02-01T00:00:00",
"2022-03-01T00:00:00",
"2022-04-01T00:00:00",
"2022-05-01T00:00:00",
"2022-06-01T00:00:00",
"2022-07-01T00:00:00",
"2022-08-01T00:00:00",
"2022-09-01T00:00:00",
"2022-10-01T00:00:00",
"2022-11-01T00:00:00",
"2022-12-01T00:00:00",
"2023-01-01T00:00:00",
"2023-02-01T00:00:00",
"2023-03-01T00:00:00",
"2023-04-01T00:00:00",
"2023-05-01T00:00:00",
"2023-06-01T00:00:00"
],
"xaxis": "x",
"y": [
1.1863674899999999,
1.198351,
1.2462851,
1.3229796,
1.3229796,
1.3318003,
1.3605442,
1.3605442,
1.4084507,
1.3892881,
1.4180321,
1.4467759,
1.4467759,
1.4371946,
1.4371946,
1.4371946,
1.3605442,
1.3746035,
1.1823512,
1.1535134,
1.1535134,
1.1535134,
1.1727387,
1.1439008,
1.1535134,
1.1342882,
1.1439008,
1.1535134,
1.163126,
1.1787628,
1.1592791,
1.1495372,
1.1300536,
1.1203117,
1.1203117,
1.1592791,
1.1592791,
1.1787628,
1.1690209,
1.1742170243799857,
1.1798589608463734,
1.1777659730942878,
1.1801574873643013,
1.1773674079570413,
1.1800574683838128,
1.1798943150950045,
1.175360447568515
],
"yaxis": "y"
},
{
"hovertemplate": "variable=oof<br>row_id=%{x}<br>value=%{y}<extra></extra>",
"legendgroup": "oof",
"marker": {
"color": "#EF553B",
"symbol": "circle"
},
"mode": "markers",
"name": "oof",
"orientation": "v",
"showlegend": true,
"type": "scatter",
"x": [
"2019-08-01T00:00:00",
"2019-09-01T00:00:00",
"2019-10-01T00:00:00",
"2019-11-01T00:00:00",
"2019-12-01T00:00:00",
"2020-01-01T00:00:00",
"2020-02-01T00:00:00",
"2020-03-01T00:00:00",
"2020-04-01T00:00:00",
"2020-05-01T00:00:00",
"2020-06-01T00:00:00",
"2020-07-01T00:00:00",
"2020-08-01T00:00:00",
"2020-09-01T00:00:00",
"2020-10-01T00:00:00",
"2020-11-01T00:00:00",
"2020-12-01T00:00:00",
"2021-01-01T00:00:00",
"2021-02-01T00:00:00",
"2021-03-01T00:00:00",
"2021-04-01T00:00:00",
"2021-05-01T00:00:00",
"2021-06-01T00:00:00",
"2021-07-01T00:00:00",
"2021-08-01T00:00:00",
"2021-09-01T00:00:00",
"2021-10-01T00:00:00",
"2021-11-01T00:00:00",
"2021-12-01T00:00:00",
"2022-01-01T00:00:00",
"2022-02-01T00:00:00",
"2022-03-01T00:00:00",
"2022-04-01T00:00:00",
"2022-05-01T00:00:00",
"2022-06-01T00:00:00",
"2022-07-01T00:00:00",
"2022-08-01T00:00:00",
"2022-09-01T00:00:00",
"2022-10-01T00:00:00",
"2022-11-01T00:00:00",
"2022-12-01T00:00:00",
"2023-01-01T00:00:00",
"2023-02-01T00:00:00",
"2023-03-01T00:00:00",
"2023-04-01T00:00:00",
"2023-05-01T00:00:00",
"2023-06-01T00:00:00"
],
"xaxis": "x",
"y": [
null,
1.1959122463923015,
1.201825426840018,
1.2516713420314654,
1.3256473439159162,
1.3360062943039246,
1.317595650169866,
1.3658255229837737,
1.3661545595411593,
1.4002722963705525,
1.3872547624875398,
1.415241594726766,
1.4538018058794624,
1.475609697508276,
1.4303118638345949,
1.4277306780456036,
1.4248567903792644,
1.3607091219844947,
1.3486925751617207,
1.188812633121493,
1.1652911294581,
1.1609494986362252,
1.149864238919837,
1.184660282606063,
1.136839376822894,
1.154783244521418,
1.1397259733701888,
1.1397259733701888,
1.1550087765827737,
1.1666870403075489,
1.1854843433482365,
1.1657697783942198,
1.140725970288753,
1.1399989867314626,
1.128819292018842,
1.1269752474025307,
1.1554346005123686,
1.16180678136804,
1.1843054053899484,
null,
null,
null,
null,
null,
null,
null,
null
],
"yaxis": "y"
}
],
"layout": {
"legend": {
"title": {
"text": "variable"
},
"tracegroupgap": 0
},
"shapes": [
{
"fillcolor": "yellow",
"opacity": 0.2,
"type": "rect",
"x0": "2022-11-01T00:00:00",
"x1": "2023-06-01T00:00:00",
"xref": "x",
"y0": 0,
"y1": 1,
"yref": "y domain"
}
],
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
"text": "17061"
},
"xaxis": {
"anchor": "y",
"domain": [
0,
1
],
"title": {
"text": "row_id"
}
},
"yaxis": {
"anchor": "x",
"domain": [
0,
1
],
"title": {
"text": "value"
}
}
}
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"cfips = ds.query('is_train').cfips.sample().unique()[0]\n",
"\n",
"series = ds.query('cfips == @cfips')['target'].to_frame('target')\n",
"series['oof'] = all_oof[all_oof.index.map(lambda x: x.split('_')[0] == str(cfips))][1]\n",
"series.index = pd.to_datetime(series.index.map(lambda x: x.split('_')[1]))\n",
"\n",
"ax = series.plot(kind='scatter', title=str(cfips))\n",
"ax = ax.add_vrect(x0=cut_off + one_month, x1=cut_off + 8 * one_month, fillcolor='yellow', opacity=0.2)\n",
"ax"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submission"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>row_id</th>\n",
" <th>microbusiness_density</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1001_2022-11-01</td>\n",
" <td>3.817671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1003_2022-11-01</td>\n",
" <td>3.817671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1005_2022-11-01</td>\n",
" <td>3.817671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1007_2022-11-01</td>\n",
" <td>3.817671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1009_2022-11-01</td>\n",
" <td>3.817671</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" row_id microbusiness_density\n",
"0 1001_2022-11-01 3.817671\n",
"1 1003_2022-11-01 3.817671\n",
"2 1005_2022-11-01 3.817671\n",
"3 1007_2022-11-01 3.817671\n",
"4 1009_2022-11-01 3.817671"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample_sub = pd.read_csv('data/sample_submission.csv')\n",
"sample_sub.head()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>row_id</th>\n",
" <th>microbusiness_density</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1001_2022-11-01</td>\n",
" <td>3.465837</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1003_2022-11-01</td>\n",
" <td>8.371675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1005_2022-11-01</td>\n",
" <td>1.229685</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1007_2022-11-01</td>\n",
" <td>1.297983</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1009_2022-11-01</td>\n",
" <td>1.837880</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" row_id microbusiness_density\n",
"0 1001_2022-11-01 3.465837\n",
"1 1003_2022-11-01 8.371675\n",
"2 1005_2022-11-01 1.229685\n",
"3 1007_2022-11-01 1.297983\n",
"4 1009_2022-11-01 1.837880"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sub = ds[~ds.is_train]['target'].rename('microbusiness_density').reset_index()\n",
"sub.head()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"assert len(sub) == len(sample_sub)\n",
"assert sub.row_id.equals(sample_sub.row_id)\n",
"assert not sub.microbusiness_density.isnull().any()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"sub.to_csv('submission.csv.zip', index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "55fbbcf542e06cc59ad76a1e0d5dc36ee204d6d2b704491656ee6b3487310122"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment