Skip to content

Instantly share code, notes, and snippets.

@pietroppeter
Last active May 3, 2022 13:03
Show Gist options
  • Save pietroppeter/d36d5f40ab3bb5cd6879b713d352d3ba to your computer and use it in GitHub Desktop.
Save pietroppeter/d36d5f40ab3bb5cd6879b713d352d3ba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Goal\n",
"Reproduce locally the work that is done in [M5 example](https://colab.research.google.com/drive/1pmp4rqiwiPL-ambxTrJGBiNMS-7vm3v6) colab notebook.\n",
"- avoid requirement of s3 storage\n",
"- avoid requirement of access to AutoTS API\n",
"\n",
"the result should be a local way to compute a best-in-class M5 forecast using [nixtla](https://github.com/Nixtla/nixtla) open source libraries."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"- **target**: time-series variable of interest. Must have three columns: unique_id, datestamp and value.\n",
"- **static**: exogenous static features for each unique_id. Must have unique_id and features in columns.\n",
"- **temporal**: exogenous temporal features. Must have unique_id, datestamp and values for each feature.\n",
"- **calendar-holidays**: dictionary with holiday name and dates with occurrences\n",
"\n",
"Data is taken from https://github.com/Nixtla/nixtla/tree/main/sdk/python-autotimeseries/data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"data_dir = \"data\"\n",
"filename_target = f\"{data_dir}/target.parquet\"\n",
"filename_static = f\"{data_dir}/static.parquet\"\n",
"filename_temporal = f\"{data_dir}/temporal.parquet\"\n",
"filename_calendar_holidays = f\"{data_dir}/calendar-holidays.txt\"\n",
"\n",
"# outputs:\n",
"filename_calendar_features = f\"{data_dir}/calendar-features.parquet\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"with open(filename_calendar_holidays) as f:\n",
" calendar_holidays_raw = f.read()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Chanukah_End=2011-12-28,2012-12-16,2013-12-05,2014-12-24,2015-12-14/Christmas=2011-12-25,2012-12-25,2013-12-25,2014-12-25,2015-12-25/Cinco_De_Mayo=2011-05-05,2012-05-05,2013-05-05,2014-05-05,2015-05-05,2016-05-05/ColumbusDay=2011-10-10,2012-10-08,2013-10-14,2014-10-13,2015-10-12/Easter=2011-04-24,2012-04-08,2013-03-31,2014-04-20,2015-04-05,2016-03-27/Eid_al-Fitr=2011-08-31,2012-08-19,2013-08-08,2014-07-29,2015-07-18/EidAlAdha=2011-11-07,2012-10-26,2013-10-15,2014-10-04,2015-09-24/Father's_day=2011-06-19,2012-06-17,2013-06-16,2014-06-15,2015-06-21,2016-06-19/Halloween=2011-10-31,2012-10-31,2013-10-31,2014-10-31,2015-10-31/IndependenceDay=2011-07-04,2012-07-04,2013-07-04,2014-07-04,2015-07-04/LaborDay=2011-09-05,2012-09-03,2013-09-02,2014-09-01,2015-09-07/LentStart=2011-03-09,2012-02-22,2013-02-13,2014-03-05,2015-02-18,2016-02-10/LentWeek2=2011-03-16,2012-02-29,2013-02-20,2014-03-12,2015-02-25,2016-02-17/MartinLutherKingDay=2012-01-16,2013-01-21,2014-01-20,2015-01-19,2016-01-18/MemorialDay=2011-05-30,2012-05-28,2013-05-27,2014-05-26,2015-05-25,2016-05-30/Mother's_day=2011-05-08,2012-05-13,2013-05-12,2014-05-11,2015-05-10,2016-05-08/NBAFinalsEnd=2011-06-12,2012-06-21,2013-06-20,2014-06-15,2015-06-16,2016-06-19/NBAFinalsStart=2011-05-31,2012-06-12,2013-06-06,2014-06-05,2015-06-04,2016-06-02/NewYear=2012-01-01,2013-01-01,2014-01-01,2015-01-01,2016-01-01/OrthodoxChristmas=2012-01-07,2013-01-07,2014-01-07,2015-01-07,2016-01-07/OrthodoxEaster=2011-04-24,2012-04-15,2013-05-05,2014-04-20,2015-04-12,2016-05-01/Pesach_End=2011-04-26,2012-04-14,2013-04-02,2014-04-22,2015-04-11,2016-04-30/PresidentsDay=2011-02-21,2012-02-20,2013-02-18,2014-02-17,2015-02-16,2016-02-15/Purim_End=2011-03-20,2012-03-08,2013-02-24,2014-03-16,2015-03-05,2016-03-24/Ramadan_starts=2011-08-01,2012-07-20,2013-07-09,2014-06-29,2015-06-18,2016-06-07/StPatricksDay=2011-03-17,2012-03-17,2013-03-17,2014-03-17,2015-03-17,2016-03-17/SuperBowl=2011-02-06,2012-02-05,2013-02-03,2014-02-02,2015-02-01,2016-02-07/Thanksgiving=2011-11-24,2012-11-22,2013-11-28,2014-11-27,2015-11-26/ValentinesDay=2011-02-14,2012-02-14,2013-02-14,2014-02-14,2015-02-14,2016-02-14/VeteransDay=2011-11-11,2012-11-11,2013-11-11,2014-11-11,2015-11-11\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"calendar_holidays_raw"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"target = pd.read_parquet(filename_target)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item_id</th>\n",
" <th>timestamp</th>\n",
" <th>demand</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-29</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-30</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-31</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-02-01</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-02-02</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item_id timestamp demand\n",
"0 FOODS_1_001_CA_1 2011-01-29 3.0\n",
"1 FOODS_1_001_CA_1 2011-01-30 0.0\n",
"2 FOODS_1_001_CA_1 2011-01-31 0.0\n",
"3 FOODS_1_001_CA_1 2011-02-01 1.0\n",
"4 FOODS_1_001_CA_1 2011-02-02 4.0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"target.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 46796220 entries, 0 to 46796219\n",
"Data columns (total 3 columns):\n",
" # Column Dtype \n",
"--- ------ ----- \n",
" 0 item_id category \n",
" 1 timestamp datetime64[ns]\n",
" 2 demand float32 \n",
"dtypes: category(1), datetime64[ns](1), float32(1)\n",
"memory usage: 626.3 MB\n"
]
}
],
"source": [
"target.info()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30490"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(target.item_id.unique())"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item_id</th>\n",
" <th>timestamp</th>\n",
" <th>snap_CA</th>\n",
" <th>snap_TX</th>\n",
" <th>snap_WI</th>\n",
" <th>sell_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-29</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-30</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-01-31</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-02-01</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>FOODS_1_001_CA_1</td>\n",
" <td>2011-02-02</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item_id timestamp snap_CA snap_TX snap_WI sell_price\n",
"0 FOODS_1_001_CA_1 2011-01-29 0 0 0 2.0\n",
"1 FOODS_1_001_CA_1 2011-01-30 0 0 0 2.0\n",
"2 FOODS_1_001_CA_1 2011-01-31 0 0 0 2.0\n",
"3 FOODS_1_001_CA_1 2011-02-01 1 1 0 2.0\n",
"4 FOODS_1_001_CA_1 2011-02-02 1 0 1 2.0"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temporal = pd.read_parquet(filename_temporal)\n",
"temporal.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 47649940 entries, 0 to 47649939\n",
"Data columns (total 6 columns):\n",
" # Column Dtype \n",
"--- ------ ----- \n",
" 0 item_id category \n",
" 1 timestamp datetime64[ns]\n",
" 2 snap_CA uint8 \n",
" 3 snap_TX uint8 \n",
" 4 snap_WI uint8 \n",
" 5 sell_price float32 \n",
"dtypes: category(1), datetime64[ns](1), float32(1), uint8(3)\n",
"memory usage: 774.0 MB\n"
]
}
],
"source": [
"temporal.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calendar features\n",
"\n",
"code called in autotimeseries for tsfeatures is https://github.com/Nixtla/nixtla/blob/main/tsfeatures/features/make_features.py\n",
"which depends on ts_features (also mlforecast, tsfresh). _not actually used directly in the colab notebook_\n",
"\n",
"code called in autotimeseries for calendartsfeatures is https://github.com/Nixtla/nixtla/blob/main/tsfeatures/calendar/make_holidays.py\n",
"which depends on holidays. made a copy of this in a `src` folder"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from src.make_holidays import CalendarFeatures"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from rich import inspect"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #000080; text-decoration-color: #000080\">╭────────────────────── </span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">&lt;</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">class</span><span style=\"color: #000000; text-decoration-color: #000000\"> </span><span style=\"color: #008000; text-decoration-color: #008000\">'src.make_holidays.CalendarFeatures'</span><span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">&gt;</span><span style=\"color: #000080; text-decoration-color: #000080\"> ───────────────────────╮</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; font-style: italic\">class </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">CalendarFeatures</span><span style=\"font-weight: bold\">(</span>filename: str, filename_output: str, country: str, events: <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> Dict<span style=\"font-weight: bold\">[</span>str, List<span style=\"font-weight: bold\">[</span>str<span style=\"font-weight: bold\">]]</span>, scale: bool, unique_id_column: str, ds_column: str, y_column: str<span style=\"font-weight: bold\">)</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> -&gt; <span style=\"color: #008000; text-decoration-color: #008000\">'CalendarFeatures'</span>: <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080\">Computes calendar features.</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">│</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span><span style=\"font-style: italic\"> attribute(s) not shown.</span> Run <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">inspect</span><span style=\"font-weight: bold\">(</span>inspect<span style=\"font-weight: bold\">)</span> for options. <span style=\"color: #000080; text-decoration-color: #000080\">│</span>\n",
"<span style=\"color: #000080; text-decoration-color: #000080\">╰───────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[34m╭─\u001b[0m\u001b[34m───────────────────── \u001b[0m\u001b[1;34m<\u001b[0m\u001b[1;95mclass\u001b[0m\u001b[39m \u001b[0m\u001b[32m'src.make_holidays.CalendarFeatures'\u001b[0m\u001b[1;34m>\u001b[0m\u001b[34m ──────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n",
"\u001b[34m│\u001b[0m \u001b[3;96mclass \u001b[0m\u001b[1;31mCalendarFeatures\u001b[0m\u001b[1m(\u001b[0mfilename: str, filename_output: str, country: str, events: \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m Dict\u001b[1m[\u001b[0mstr, List\u001b[1m[\u001b[0mstr\u001b[1m]\u001b[0m\u001b[1m]\u001b[0m, scale: bool, unique_id_column: str, ds_column: str, y_column: str\u001b[1m)\u001b[0m \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m -> \u001b[32m'CalendarFeatures'\u001b[0m: \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m \u001b[36mComputes calendar features.\u001b[0m \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n",
"\u001b[34m│\u001b[0m \u001b[1;36m27\u001b[0m\u001b[3m attribute(s) not shown.\u001b[0m Run \u001b[1;35minspect\u001b[0m\u001b[1m(\u001b[0minspect\u001b[1m)\u001b[0m for options. \u001b[34m│\u001b[0m\n",
"\u001b[34m╰───────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"inspect(CalendarFeatures)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"this is the call that we are trying to reproduce locally\n",
"\n",
"```python\n",
"response_calendar = autotimeseries.calendartsfeatures(filename=filename_temporal,\n",
" country='USA',\n",
" events=filename_calendar_holidays,\n",
" **columns)\n",
"```\n",
"\n",
"from [autotimeseries.api.main#L73](https://github.com/Nixtla/nixtla/blob/74e4560f1bdb6bf64445f3c45005fe74c0a0a427/api/main.py#L73):\n",
"\n",
"```python\n",
"@app.post('/calendartsfeatures')\n",
"def compute_calendartsfeatures(s3_args: S3Args, args: CalendarTSFeaturesArgs):\n",
" \"\"\"Calculates features using sagemaker.\"\"\"\n",
" sagemaker_response = run_sagemaker(url=s3_args.s3_url,\n",
" dest_url=s3_args.s3_dest_url,\n",
" output_name=f'calendar-features.csv',\n",
" script='calendar/make_holidays.py',\n",
" arguments=parse_args(args))\n",
"\n",
" return sagemaker_response\n",
"```\n",
"\n",
"note that `events` comes from calendar_holidays but needs to be processed as a `Dict[str, list]`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Chanukah_End': ['2011-12-28',\n",
" '2012-12-16',\n",
" '2013-12-05',\n",
" '2014-12-24',\n",
" '2015-12-14'],\n",
" 'Christmas': ['2011-12-25',\n",
" '2012-12-25',\n",
" '2013-12-25',\n",
" '2014-12-25',\n",
" '2015-12-25'],\n",
" 'Cinco_De_Mayo': ['2011-05-05',\n",
" '2012-05-05',\n",
" '2013-05-05',\n",
" '2014-05-05',\n",
" '2015-05-05',\n",
" '2016-05-05'],\n",
" 'ColumbusDay': ['2011-10-10',\n",
" '2012-10-08',\n",
" '2013-10-14',\n",
" '2014-10-13',\n",
" '2015-10-12'],\n",
" 'Easter': ['2011-04-24',\n",
" '2012-04-08',\n",
" '2013-03-31',\n",
" '2014-04-20',\n",
" '2015-04-05',\n",
" '2016-03-27'],\n",
" 'Eid_al-Fitr': ['2011-08-31',\n",
" '2012-08-19',\n",
" '2013-08-08',\n",
" '2014-07-29',\n",
" '2015-07-18'],\n",
" 'EidAlAdha': ['2011-11-07',\n",
" '2012-10-26',\n",
" '2013-10-15',\n",
" '2014-10-04',\n",
" '2015-09-24'],\n",
" \"Father's_day\": ['2011-06-19',\n",
" '2012-06-17',\n",
" '2013-06-16',\n",
" '2014-06-15',\n",
" '2015-06-21',\n",
" '2016-06-19'],\n",
" 'Halloween': ['2011-10-31',\n",
" '2012-10-31',\n",
" '2013-10-31',\n",
" '2014-10-31',\n",
" '2015-10-31'],\n",
" 'IndependenceDay': ['2011-07-04',\n",
" '2012-07-04',\n",
" '2013-07-04',\n",
" '2014-07-04',\n",
" '2015-07-04'],\n",
" 'LaborDay': ['2011-09-05',\n",
" '2012-09-03',\n",
" '2013-09-02',\n",
" '2014-09-01',\n",
" '2015-09-07'],\n",
" 'LentStart': ['2011-03-09',\n",
" '2012-02-22',\n",
" '2013-02-13',\n",
" '2014-03-05',\n",
" '2015-02-18',\n",
" '2016-02-10'],\n",
" 'LentWeek2': ['2011-03-16',\n",
" '2012-02-29',\n",
" '2013-02-20',\n",
" '2014-03-12',\n",
" '2015-02-25',\n",
" '2016-02-17'],\n",
" 'MartinLutherKingDay': ['2012-01-16',\n",
" '2013-01-21',\n",
" '2014-01-20',\n",
" '2015-01-19',\n",
" '2016-01-18'],\n",
" 'MemorialDay': ['2011-05-30',\n",
" '2012-05-28',\n",
" '2013-05-27',\n",
" '2014-05-26',\n",
" '2015-05-25',\n",
" '2016-05-30'],\n",
" \"Mother's_day\": ['2011-05-08',\n",
" '2012-05-13',\n",
" '2013-05-12',\n",
" '2014-05-11',\n",
" '2015-05-10',\n",
" '2016-05-08'],\n",
" 'NBAFinalsEnd': ['2011-06-12',\n",
" '2012-06-21',\n",
" '2013-06-20',\n",
" '2014-06-15',\n",
" '2015-06-16',\n",
" '2016-06-19'],\n",
" 'NBAFinalsStart': ['2011-05-31',\n",
" '2012-06-12',\n",
" '2013-06-06',\n",
" '2014-06-05',\n",
" '2015-06-04',\n",
" '2016-06-02'],\n",
" 'NewYear': ['2012-01-01',\n",
" '2013-01-01',\n",
" '2014-01-01',\n",
" '2015-01-01',\n",
" '2016-01-01'],\n",
" 'OrthodoxChristmas': ['2012-01-07',\n",
" '2013-01-07',\n",
" '2014-01-07',\n",
" '2015-01-07',\n",
" '2016-01-07'],\n",
" 'OrthodoxEaster': ['2011-04-24',\n",
" '2012-04-15',\n",
" '2013-05-05',\n",
" '2014-04-20',\n",
" '2015-04-12',\n",
" '2016-05-01'],\n",
" 'Pesach_End': ['2011-04-26',\n",
" '2012-04-14',\n",
" '2013-04-02',\n",
" '2014-04-22',\n",
" '2015-04-11',\n",
" '2016-04-30'],\n",
" 'PresidentsDay': ['2011-02-21',\n",
" '2012-02-20',\n",
" '2013-02-18',\n",
" '2014-02-17',\n",
" '2015-02-16',\n",
" '2016-02-15'],\n",
" 'Purim_End': ['2011-03-20',\n",
" '2012-03-08',\n",
" '2013-02-24',\n",
" '2014-03-16',\n",
" '2015-03-05',\n",
" '2016-03-24'],\n",
" 'Ramadan_starts': ['2011-08-01',\n",
" '2012-07-20',\n",
" '2013-07-09',\n",
" '2014-06-29',\n",
" '2015-06-18',\n",
" '2016-06-07'],\n",
" 'StPatricksDay': ['2011-03-17',\n",
" '2012-03-17',\n",
" '2013-03-17',\n",
" '2014-03-17',\n",
" '2015-03-17',\n",
" '2016-03-17'],\n",
" 'SuperBowl': ['2011-02-06',\n",
" '2012-02-05',\n",
" '2013-02-03',\n",
" '2014-02-02',\n",
" '2015-02-01',\n",
" '2016-02-07'],\n",
" 'Thanksgiving': ['2011-11-24',\n",
" '2012-11-22',\n",
" '2013-11-28',\n",
" '2014-11-27',\n",
" '2015-11-26'],\n",
" 'ValentinesDay': ['2011-02-14',\n",
" '2012-02-14',\n",
" '2013-02-14',\n",
" '2014-02-14',\n",
" '2015-02-14',\n",
" '2016-02-14'],\n",
" 'VeteransDay': ['2011-11-11',\n",
" '2012-11-11',\n",
" '2013-11-11',\n",
" '2014-11-11',\n",
" '2015-11-11']}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"calendar_holidays = {e[0]: e[1].split(\",\") for e in [event.split(\"=\") for event in calendar_holidays_raw.split(\"/\")]}\n",
"calendar_holidays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"had to put this in make_holidays (otherwise it cannot be used as a library):\n",
"```python\n",
"import logging\n",
"logging.basicConfig(level=logging.INFO)\n",
"logger = logging.getLogger(__name__)\n",
"```\n",
"\n",
"also had to change the references to directory \"/opt/ml/processing/output/\" in `reader` and `writer`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:src.make_holidays:Reading file...\n",
"INFO:src.make_holidays:File read.\n"
]
}
],
"source": [
"calendarfeatures = CalendarFeatures(\n",
" filename=filename_temporal,\n",
" filename_output=filename_calendar_features,\n",
" country=\"USA\",\n",
" events=calendar_holidays,\n",
" scale=False,\n",
" unique_id_column=\"item_id\",\n",
" ds_column=\"timestamp\",\n",
" y_column=\"\" # not used, removed in make_holidays code the only occurence (in renamer)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:src.make_holidays:Computing features...\n"
]
},
{
"ename": "UFuncTypeError",
"evalue": "Cannot cast ufunc 'greater' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mUFuncTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-16-2c703a4c6e68>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mcalendarfeatures\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_calendar_features\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mget_calendar_features\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 203\u001b[0m \u001b[0myear_list\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mlist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmin_year\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_year\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 204\u001b[0m \u001b[0mcountry\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcountry\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 205\u001b[1;33m events=self.events)\n\u001b[0m\u001b[0;32m 206\u001b[0m \u001b[1;31m# hack, it should be an argument\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 207\u001b[0m \u001b[0mholidays\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mholidays\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mint\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mmake_holidays_distance_df\u001b[1;34m(dates, year_list, country, events)\u001b[0m\n\u001b[0;32m 131\u001b[0m \u001b[0mholiday_dates\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mholiday_dates\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtolist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 132\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 133\u001b[1;33m \u001b[0mdistance_dict\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mholiday\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdistance_to_holiday\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mholiday_dates\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdates\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 134\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 135\u001b[0m \u001b[0mholidays_distance_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance_dict\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mP:\\Development\\ppeterlongo\\m5\\nixtla\\src\\make_holidays.py\u001b[0m in \u001b[0;36mdistance_to_holiday\u001b[1;34m(holiday_dates, dates)\u001b[0m\n\u001b[0;32m 101\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mabs\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 102\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 103\u001b[1;33m \u001b[0mdistance\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdistance\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m183\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m365\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mdistance\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdistance\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m183\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 104\u001b[0m \u001b[1;31m# Convert to minutes\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 105\u001b[0m \u001b[0mdistance\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdistance\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mastype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfloat\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mUFuncTypeError\u001b[0m: Cannot cast ufunc 'greater' input 0 from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'"
]
}
],
"source": [
"calendarfeatures.get_calendar_features()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"known issue (blocker): https://github.com/Nixtla/nixtla/issues/15"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:m5]",
"language": "python",
"name": "conda-env-m5-py"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment