Skip to content

Instantly share code, notes, and snippets.

@min2bro
Created October 8, 2019 06:19
Show Gist options
  • Save min2bro/a6ad219536819d4ffa1e9f83df5b437a to your computer and use it in GitHub Desktop.
Save min2bro/a6ad219536819d4ffa1e9f83df5b437a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comprehensive Guide on Pandas Datetime"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"##### In this post we will explore the Pandas datetime methods which can be used instantaneously to work with datetime in Pandas.\n",
"\n",
"##### I am sharing the table of content in case you are just interested to see a specific topic then this would help you to jump directly over there"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"Table of Content:\n",
" 1. Datetime Index using date_range - frequency\n",
" 2. Creat Dataframe using Datetime Index\n",
" 3. Import CSV file in a dataframe using parse_date\n",
" 4. Convert date column to datetime object\n",
" 5. Extract data using datetime accessor\n",
" 6. Timeseries Aggreggation using resample\n",
" 7. Indexing and Slicing of datetime Index\n",
" 8. Date Offsets: Adding a day, adding hours etc.\n",
" 9. Advanced Datetime Index functions\n",
" 10. Understanding CustomBusinessDays\n",
" 11. Timezones\n",
" 12. Business Hours and CustomBusinessHours\n",
" 13. Difference Between two dates column"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import time-series data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### This is the monthly electrical consumption data in csv which we will import in a dataframe and data can be downloaded using this link"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### parse_dates attributes in read_csv() function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### We are using parse_date attribute to parse and convert the date columns in the csv files to numpy datetime64 type"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 397 entries, 0 to 396\n",
"Data columns (total 2 columns):\n",
"DATE 397 non-null datetime64[ns]\n",
"IPG2211A2N 397 non-null float64\n",
"dtypes: datetime64[ns](1), float64(1)\n",
"memory usage: 6.3 KB\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"df=pd.read_csv('./Electric_Production.csv',parse_dates=['DATE'])\n",
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pandas to_datetime"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Alternatively, you can use to_datetime to convert any column to datetime"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 397 entries, 0 to 396\n",
"Data columns (total 2 columns):\n",
"DATE 397 non-null datetime64[ns]\n",
"IPG2211A2N 397 non-null float64\n",
"dtypes: datetime64[ns](1), float64(1)\n",
"memory usage: 6.3 KB\n"
]
}
],
"source": [
"df['DATE']=pd.to_datetime(df['DATE'])\n",
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract Month and Year from datetime using datetime accessor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### We will create 3 new columns here for Year, Month and day after extracting it from the Date column"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day\n",
"0 1985-01-01 72.5052 1985 1 1\n",
"1 1985-02-01 70.6720 1985 2 1\n",
"2 1985-03-01 62.4502 1985 3 1\n",
"3 1985-04-01 57.4714 1985 4 1\n",
"4 1985-05-01 55.3151 1985 5 1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Year']=df['DATE'].dt.year\n",
"df['month']=df['DATE'].dt.month\n",
"df['day']=df['DATE'].dt.day\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day\n",
"0 1985-01-01 72.5052 1985 1 1\n",
"1 1985-02-01 70.6720 1985 2 1\n",
"2 1985-03-01 62.4502 1985 3 1\n",
"3 1985-04-01 57.4714 1985 4 1\n",
"4 1985-05-01 55.3151 1985 5 1"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# OR\n",
"\n",
"df['Year']=df['DATE'].apply(lambda x: x.year)\n",
"df['month']=df['DATE'].apply(lambda x: x.month)\n",
"df['day']=df['DATE'].apply(lambda x: x.day)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time Series- Aggregation\n",
"### resample to find sum on the date index date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### resample() is a method in pandas that can be used to summarize data by date or time. \n",
"\n",
"###### Let's find the Yearly sum of Electricity Consumption"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1985-12-31</td>\n",
" <td>745.9880</td>\n",
" <td>23820</td>\n",
" <td>78</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1986-12-31</td>\n",
" <td>752.5187</td>\n",
" <td>23832</td>\n",
" <td>78</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1987-12-31</td>\n",
" <td>788.8833</td>\n",
" <td>23844</td>\n",
" <td>78</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1988-12-31</td>\n",
" <td>836.5963</td>\n",
" <td>23856</td>\n",
" <td>78</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-12-31</td>\n",
" <td>862.7420</td>\n",
" <td>23868</td>\n",
" <td>78</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" IPG2211A2N Year month day\n",
"DATE \n",
"1985-12-31 745.9880 23820 78 12\n",
"1986-12-31 752.5187 23832 78 12\n",
"1987-12-31 788.8833 23844 78 12\n",
"1988-12-31 836.5963 23856 78 12\n",
"1989-12-31 862.7420 23868 78 12"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('DATE').resample('1Y').sum().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### resample to find mean on the date index date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Lets find the Electricity consumption mean for each year"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1985-12-31</td>\n",
" <td>62.165667</td>\n",
" <td>1985.0</td>\n",
" <td>6.5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1986-12-31</td>\n",
" <td>62.709892</td>\n",
" <td>1986.0</td>\n",
" <td>6.5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1987-12-31</td>\n",
" <td>65.740275</td>\n",
" <td>1987.0</td>\n",
" <td>6.5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1988-12-31</td>\n",
" <td>69.716358</td>\n",
" <td>1988.0</td>\n",
" <td>6.5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-12-31</td>\n",
" <td>71.895167</td>\n",
" <td>1989.0</td>\n",
" <td>6.5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" IPG2211A2N Year month day\n",
"DATE \n",
"1985-12-31 62.165667 1985.0 6.5 1.0\n",
"1986-12-31 62.709892 1986.0 6.5 1.0\n",
"1987-12-31 65.740275 1987.0 6.5 1.0\n",
"1988-12-31 69.716358 1988.0 6.5 1.0\n",
"1989-12-31 71.895167 1989.0 6.5 1.0"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('DATE').resample('1Y').mean().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Datetime index and slice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Just ensure that the datetime column is set as index for the dataframe. I am using set_index() function to set that before index and slice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Filter using the date"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1987-01-01</td>\n",
" <td>73.8152</td>\n",
" <td>1987</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1987-02-01</td>\n",
" <td>70.0620</td>\n",
" <td>1987</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" IPG2211A2N Year month day\n",
"DATE \n",
"1987-01-01 73.8152 1987 1 1\n",
"1987-02-01 70.0620 1987 2 1"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('DATE')['1987'].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Filter all rows between two dates i.e. 1989-JAN and 1995-Apr here"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>1989-01-01</td>\n",
" <td>77.9188</td>\n",
" <td>1989</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-02-01</td>\n",
" <td>76.6822</td>\n",
" <td>1989</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-03-01</td>\n",
" <td>73.3523</td>\n",
" <td>1989</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-04-01</td>\n",
" <td>65.1081</td>\n",
" <td>1989</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1989-05-01</td>\n",
" <td>63.6892</td>\n",
" <td>1989</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" IPG2211A2N Year month day\n",
"DATE \n",
"1989-01-01 77.9188 1989 1 1\n",
"1989-02-01 76.6822 1989 2 1\n",
"1989-03-01 73.3523 1989 3 1\n",
"1989-04-01 65.1081 1989 4 1\n",
"1989-05-01 63.6892 1989 5 1"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('DATE')['1989-01':'1995-04'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Date Offset\n",
"\n",
"#### Its a kind of date increment used for a date range. As per the documentation:\n",
"\n",
"#### DateOffset work as follows. Each offset specify a set of dates that conform to the DateOffset. For example, Bday defines this #### set to be the set of dates that are weekdays (M-F). To test if a date is in the set of a DateOffset dateOffset we can use the #### onOffset method: dateOffset.onOffset(date).\n",
"\n",
"#### If a date is not on a valid date, the rollback and rollforward methods can be used to roll the date to the nearest valid date #### before/after the date.\n",
"\n",
"#### DateOffsets can be created to move dates forward a given number of valid dates. For example, Bday(2) can be added to a #### date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date\n",
"\n",
"#### Add one day\n",
"\n",
"#### Here we are adding a day(timedelta of 1 day) to the Date column in dataframe and creating a new column called as next_day"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>next_day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1985-01-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1985-02-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1985-03-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1985-04-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1985-05-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day next_day\n",
"0 1985-01-01 72.5052 1985 1 1 1985-01-02\n",
"1 1985-02-01 70.6720 1985 2 1 1985-02-02\n",
"2 1985-03-01 62.4502 1985 3 1 1985-03-02\n",
"3 1985-04-01 57.4714 1985 4 1 1985-04-02\n",
"4 1985-05-01 55.3151 1985 5 1 1985-05-02"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['next_day']=df['DATE']+pd.Timedelta('1 day')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding a Business day"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Here we are adding a Business day using Bday param, it will add a day between Mon-Fri. if a date is Sat then add a bday will return the next Monday i.e. a Business day instead of a Saturday"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>next_day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1985-01-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1985-02-04</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1985-03-04</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1985-04-02</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1985-05-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day next_day\n",
"0 1985-01-01 72.5052 1985 1 1 1985-01-02\n",
"1 1985-02-01 70.6720 1985 2 1 1985-02-04\n",
"2 1985-03-01 62.4502 1985 3 1 1985-03-04\n",
"3 1985-04-01 57.4714 1985 4 1 1985-04-02\n",
"4 1985-05-01 55.3151 1985 5 1 1985-05-02"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['next_day']=df['DATE'].apply(lambda x: x+pd.offsets.BDay(1))\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add 2 business days"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Addind two days to the current DATE column using days parameter and create a new column day_after"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>next_day</th>\n",
" <th>day_after</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1985-01-02</td>\n",
" <td>1985-01-03</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1985-02-02</td>\n",
" <td>1985-02-03</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1985-03-02</td>\n",
" <td>1985-03-03</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1985-04-02</td>\n",
" <td>1985-04-03</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1985-05-02</td>\n",
" <td>1985-05-03</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day next_day day_after\n",
"0 1985-01-01 72.5052 1985 1 1 1985-01-02 1985-01-03\n",
"1 1985-02-01 70.6720 1985 2 1 1985-02-02 1985-02-03\n",
"2 1985-03-01 62.4502 1985 3 1 1985-03-02 1985-03-03\n",
"3 1985-04-01 57.4714 1985 4 1 1985-04-02 1985-04-03\n",
"4 1985-05-01 55.3151 1985 5 1 1985-05-02 1985-05-03"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['day_after']=df['DATE'].apply(lambda x: x+pd.DateOffset(days=2))\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add next month date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Adding a month to the DATE column using months parameter"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>next_day</th>\n",
" <th>day_after</th>\n",
" <th>next_month_day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1985-01-02</td>\n",
" <td>1985-01-03</td>\n",
" <td>1985-02-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1985-02-04</td>\n",
" <td>1985-02-03</td>\n",
" <td>1985-03-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1985-03-04</td>\n",
" <td>1985-03-03</td>\n",
" <td>1985-04-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1985-04-02</td>\n",
" <td>1985-04-03</td>\n",
" <td>1985-05-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1985-05-02</td>\n",
" <td>1985-05-03</td>\n",
" <td>1985-06-01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day next_day day_after \\\n",
"0 1985-01-01 72.5052 1985 1 1 1985-01-02 1985-01-03 \n",
"1 1985-02-01 70.6720 1985 2 1 1985-02-04 1985-02-03 \n",
"2 1985-03-01 62.4502 1985 3 1 1985-03-04 1985-03-03 \n",
"3 1985-04-01 57.4714 1985 4 1 1985-04-02 1985-04-03 \n",
"4 1985-05-01 55.3151 1985 5 1 1985-05-02 1985-05-03 \n",
"\n",
" next_month_day \n",
"0 1985-02-01 \n",
"1 1985-03-01 \n",
"2 1985-04-01 \n",
"3 1985-05-01 \n",
"4 1985-06-01 "
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['next_month_day']=df['DATE'].apply(lambda x: x+pd.DateOffset(months=1))\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### For the complete list of parameters check this link\n",
"#### https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using date_range to create datetime index\n",
"\n",
"#### it is Immutable numpy ndarray of datetime64 data, We will see how to create datetime index and eventually create a dataframe using these datetime index arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datetime index with Hourly frequency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### It gives the array of date and time starting from '2018-01-01' with a Hourly frequency and period=3 means total elements of 3"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',\n",
" '2018-01-01 02:00:00'],\n",
" dtype='datetime64[ns]', freq='H')"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"dti = pd.date_range('2018-01-01', periods=3, freq='H')\n",
"dti"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monthly Frequency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now change the frequency to Monthly and create total 10 date array"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',\n",
" '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',\n",
" '2018-09-30', '2018-10-31'],\n",
" dtype='datetime64[ns]', freq='M')"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index = pd.date_range('2018-01-01',periods=10, freq='M')\n",
"index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weekly Frequency with start and end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Change the frequency to Weekly and create dates between two dates using start and end dates"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',\n",
" '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',\n",
" '2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24',\n",
" '2019-03-31', '2019-04-07', '2019-04-14', '2019-04-21',\n",
" '2019-04-28'],\n",
" dtype='datetime64[ns]', freq='W-SUN')"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.date_range(start='2019-01-01', end='2019-04-30', freq='W')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datetime index with start and end"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',\n",
" '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',\n",
" '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',\n",
" '2011-01-13', '2011-01-14', '2011-01-15', '2011-01-16',\n",
" '2011-01-17', '2011-01-18', '2011-01-19', '2011-01-20',\n",
" '2011-01-21', '2011-01-22', '2011-01-23', '2011-01-24',\n",
" '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28',\n",
" '2011-01-29', '2011-01-30', '2011-01-31', '2011-02-01'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import datetime\n",
"start = datetime.datetime(2011, 1, 1)\n",
"\n",
"end = datetime.datetime(2011, 2, 1)\n",
"\n",
"index = pd.date_range(start, end)\n",
"index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create dataframe using date time index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create dataframe with datetime as index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Here index: dti is the date_range created above with hourly frequency"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>2018-01-01 00:00:00</td>\n",
" <td>1.005757</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2018-01-01 01:00:00</td>\n",
" <td>15.762388</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2018-01-01 02:00:00</td>\n",
" <td>3.260808</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price\n",
"2018-01-01 00:00:00 1.005757\n",
"2018-01-01 01:00:00 15.762388\n",
"2018-01-01 02:00:00 3.260808"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"df= pd.DataFrame({'price':np.random.uniform(0,20,size=3)},index=dti)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create datafrema with datetime as a column"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>price</th>\n",
" <th>date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>17.763795</td>\n",
" <td>2018-01-01 00:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>4.811412</td>\n",
" <td>2018-01-01 01:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>13.950828</td>\n",
" <td>2018-01-01 02:00:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price date\n",
"0 17.763795 2018-01-01 00:00:00\n",
"1 4.811412 2018-01-01 01:00:00\n",
"2 13.950828 2018-01-01 02:00:00"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"df= pd.DataFrame({'price':np.random.uniform(0,20,size=3),'date':dti})\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datetime Index Using Holiday Calendar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### You can also use the Holiday calendars to provide the list of holidays. Here we are using freq as US holiday calendar, So the\n",
"#### final datetime index will skip all the dates available in that holiday calendar"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DatetimeIndex(['2019-12-24', '2019-12-26', '2019-12-27', '2019-12-30',\n",
" '2019-12-31'],\n",
" dtype='datetime64[ns]', freq='C')\n"
]
}
],
"source": [
"from pandas.tseries.holiday import USFederalHolidayCalendar\n",
"from pandas.tseries.offsets import CustomBusinessDay\n",
"\n",
"us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())\n",
"print (pd.date_range(start='2019-12-24',end='2019-12-31', freq=us_bd))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datetime Index using Origin Parameter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### You can set the origin date and a list of days as a parameter and add that to origin date. Here the origin is 2019-10-25 \n",
"#### and adding 1 day to it gives 2019-10-26 and similarly adding 2 and 3 gives 2019-10-27 and 2019-10-28 resp"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2019-10-26', '2019-10-27', '2019-10-28'], dtype='datetime64[ns]', freq=None)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('2019-10-25'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Week masking and Holidays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### One of the important feature is Week masking, In Middle eastern countries the working days in a week is thru Sun-Thu and \n",
"#### Fri,Sat is considered as Weekends. So here we are creating a dateindex using such working weeks from Sun-Thu and\n",
"#### list of Holidays"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2011-01-02', '2011-01-03', '2011-01-04', '2011-01-06',\n",
" '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',\n",
" '2011-01-13', '2011-01-16', '2011-01-17', '2011-01-18',\n",
" '2011-01-19', '2011-01-20', '2011-01-23', '2011-01-24',\n",
" '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-30',\n",
" '2011-01-31', '2011-02-01'],\n",
" dtype='datetime64[ns]', freq='C')"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weekmask = 'Sun Mon Tue Wed Thu'\n",
"\n",
"holidays = [datetime.datetime(2011, 1, 5), datetime.datetime(2011, 3, 14)]\n",
"\n",
"pd.bdate_range(start, end, freq='C', weekmask=weekmask, holidays=holidays)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Understand Custom Business days"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using CustomBusinessdays you can create the custom business day using the same example of Middle eastern countries\n",
"#### as shown above. This will work exactly the same way as Dateoffset Bday() explained above. \n",
"\n",
"#### As shown in the example here, if we add 2 business days after 2013-04-30(Tue) considering 2013-05-01(Wed) is a holiday as it in holiday #### list, If we add 2 Middle eastern business day to 2013-04-30 then it will return 2013-05-05 since Wed(2013-05-01) is a Holiday\n",
"#### and Fri and Sat is a weekend so the 2nd business day is Sunday i.e. 2013-05-05"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2013-05-05 17:00:00')"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import datetime\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"weekmask_egypt = 'Sun Mon Tue Wed Thu'\n",
"\n",
"holidays = ['2012-05-01',datetime.datetime(2013, 5, 1),np.datetime64('2014-05-01')]\n",
"\n",
"bday_egypt = pd.offsets.CustomBusinessDay(holidays=holidays,weekmask=weekmask_egypt)\n",
"\n",
"# dt = datetime.datetime(2013, 4, 30)\n",
"dt = pd.Timestamp('2013-04-30 17:00')\n",
"dt+2*bday_egypt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Truncate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Two date attributes after and before is used to filter the records"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>IPG2211A2N</th>\n",
" <th>Year</th>\n",
" <th>month</th>\n",
" <th>day</th>\n",
" <th>next_day</th>\n",
" <th>day_after</th>\n",
" <th>next_month_day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>1985-01-01</td>\n",
" <td>72.5052</td>\n",
" <td>1985</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1985-01-02</td>\n",
" <td>1985-01-03</td>\n",
" <td>1985-02-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1985-02-01</td>\n",
" <td>70.6720</td>\n",
" <td>1985</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1985-02-04</td>\n",
" <td>1985-02-03</td>\n",
" <td>1985-03-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1985-03-01</td>\n",
" <td>62.4502</td>\n",
" <td>1985</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1985-03-04</td>\n",
" <td>1985-03-03</td>\n",
" <td>1985-04-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>1985-04-01</td>\n",
" <td>57.4714</td>\n",
" <td>1985</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1985-04-02</td>\n",
" <td>1985-04-03</td>\n",
" <td>1985-05-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>1985-05-01</td>\n",
" <td>55.3151</td>\n",
" <td>1985</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1985-05-02</td>\n",
" <td>1985-05-03</td>\n",
" <td>1985-06-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>197</td>\n",
" <td>2001-06-01</td>\n",
" <td>90.3955</td>\n",
" <td>2001</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>2001-06-04</td>\n",
" <td>2001-06-03</td>\n",
" <td>2001-07-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>198</td>\n",
" <td>2001-07-01</td>\n",
" <td>96.0740</td>\n",
" <td>2001</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" <td>2001-07-02</td>\n",
" <td>2001-07-03</td>\n",
" <td>2001-08-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>199</td>\n",
" <td>2001-08-01</td>\n",
" <td>99.5534</td>\n",
" <td>2001</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2001-08-02</td>\n",
" <td>2001-08-03</td>\n",
" <td>2001-09-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>200</td>\n",
" <td>2001-09-01</td>\n",
" <td>88.2810</td>\n",
" <td>2001</td>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>2001-09-03</td>\n",
" <td>2001-09-03</td>\n",
" <td>2001-10-01</td>\n",
" </tr>\n",
" <tr>\n",
" <td>201</td>\n",
" <td>2001-10-01</td>\n",
" <td>82.6860</td>\n",
" <td>2001</td>\n",
" <td>10</td>\n",
" <td>1</td>\n",
" <td>2001-10-02</td>\n",
" <td>2001-10-03</td>\n",
" <td>2001-11-01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>202 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" DATE IPG2211A2N Year month day next_day day_after \\\n",
"0 1985-01-01 72.5052 1985 1 1 1985-01-02 1985-01-03 \n",
"1 1985-02-01 70.6720 1985 2 1 1985-02-04 1985-02-03 \n",
"2 1985-03-01 62.4502 1985 3 1 1985-03-04 1985-03-03 \n",
"3 1985-04-01 57.4714 1985 4 1 1985-04-02 1985-04-03 \n",
"4 1985-05-01 55.3151 1985 5 1 1985-05-02 1985-05-03 \n",
".. ... ... ... ... ... ... ... \n",
"197 2001-06-01 90.3955 2001 6 1 2001-06-04 2001-06-03 \n",
"198 2001-07-01 96.0740 2001 7 1 2001-07-02 2001-07-03 \n",
"199 2001-08-01 99.5534 2001 8 1 2001-08-02 2001-08-03 \n",
"200 2001-09-01 88.2810 2001 9 1 2001-09-03 2001-09-03 \n",
"201 2001-10-01 82.6860 2001 10 1 2001-10-02 2001-10-03 \n",
"\n",
" next_month_day \n",
"0 1985-02-01 \n",
"1 1985-03-01 \n",
"2 1985-04-01 \n",
"3 1985-05-01 \n",
"4 1985-06-01 \n",
".. ... \n",
"197 2001-07-01 \n",
"198 2001-08-01 \n",
"199 2001-09-01 \n",
"200 2001-10-01 \n",
"201 2001-11-01 \n",
"\n",
"[202 rows x 8 columns]"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.truncate(after='2019-10')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Timezones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using parameter tz you can set the timezone for the timestamp, You can check the list of pytz timezones"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2016-10-30 00:00:00+0530', tz='Asia/Kolkata')"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts = pd.Timestamp('2016-10-30 00:00:00', tz='Asia/Kolkata')\n",
"ts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### convert the timezone of a timestamp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Convert the timestamp to another timezone using tz_convert"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2016-10-29 20:30:00+0200', tz='Europe/Amsterdam')"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"pd.Timestamp('2016-10-30 00:00:00', tz='Asia/Kolkata').tz_convert('Europe/Amsterdam')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Business Hour"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### default business hour is from 9:00 AM to 5:00PM for 7 hours. Adding 2 business hours returns 11:00AM and adding 8 business hours returns the next day"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2016-10-31 11:00:00+0530', tz='Asia/Kolkata')"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bh = pd.offsets.BusinessHour()\n",
"pd.Timestamp('2016-10-30 00:00:00', tz='Asia/Kolkata')+2*bh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### setting business hour"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### You can also set your own business hours with a start and end time"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bh = pd.offsets.BusinessHour(start='11:00', end=datetime.time(20, 0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Business Hour"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### You can also set a CustomBusinessHours incorporating the Holiday Calendar list with a start and end business hours and\n",
"#### weekmask as explained above"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'CustomBusinessHour' is not defined",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-71-df4d1692dbac>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# Makr this Italic\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mCustomBusinessHour\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mn\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mweekmask\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'Mon Tue Wed Thu Fri'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mholidays\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcalendar\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstart\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'09:00'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mend\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'17:00'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0moffset\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdatetime\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtimedelta\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mNameError\u001b[0m: name 'CustomBusinessHour' is not defined"
]
}
],
"source": [
"# Makr this Italic\n",
"CustomBusinessHour(n=1, normalize=False, weekmask='Mon Tue Wed Thu Fri', holidays=None, calendar=None, start='09:00', end='17:00', offset=datetime.timedelta(0))"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2014-01-17 16:00:00')"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import datetime\n",
"from pandas.tseries.holiday import USFederalHolidayCalendar\n",
"dt = datetime.datetime(2014, 1, 17, 15)\n",
"\n",
"bhour_us = pd.offsets.CustomBusinessHour(calendar=USFederalHolidayCalendar(),start='11:00', end=datetime.time(20, 0),\n",
" weekmask='Mon Tue Wed Thu Fri')\n",
"dt+bhour_us"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Difference between two date columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Lets see how to find difference between two datetime columns in dataframe in terms of no of days, seconds etc"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Letter</th>\n",
" <th>First_Day</th>\n",
" <th>Last_Day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>A</td>\n",
" <td>2019-10-06 12:25:53</td>\n",
" <td>2019-10-04 10:10:53</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>A</td>\n",
" <td>2019-10-04 10:10:53</td>\n",
" <td>2019-10-01 08:10:53</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>B</td>\n",
" <td>2019-10-01 08:10:53</td>\n",
" <td>2019-09-23 01:24:53</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>B</td>\n",
" <td>2019-09-23 01:24:53</td>\n",
" <td>2019-09-23 15:58:17</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Letter First_Day Last_Day\n",
"0 A 2019-10-06 12:25:53 2019-10-04 10:10:53\n",
"1 A 2019-10-04 10:10:53 2019-10-01 08:10:53\n",
"2 B 2019-10-01 08:10:53 2019-09-23 01:24:53\n",
"3 B 2019-09-23 01:24:53 2019-09-23 15:58:17"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from datetime import datetime\n",
"import numpy as np\n",
"\n",
"# create dataframe\n",
"df = pd.DataFrame(data=[['A', '2019-10-06T12:25:53', '2019-10-04T10:10:53'],\n",
" ['A', '2019-10-04T10:10:53', '2019-10-01T08:10:53'],\n",
" ['B', '2019-10-01T08:10:53', '2019-09-23T01:24:53'],\n",
" ['B', '2019-09-23T01:24:53', '2019-09-23T15:58:17']],\n",
" columns=['Letter', 'First_Day', 'Last_Day'])\n",
"\n",
"df['First_Day']=pd.to_datetime(df['First_Day'])\n",
"df['Last_Day']=pd.to_datetime(df['Last_Day'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Difference between two dates in days and seconds"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['diff']=(pd.to_datetime(df['First_Day']) - pd.to_datetime(df['Last_Day'])).dt.days\n",
"df['diff_time_delta']=df['First_Day']-df['Last_Day']\n",
"df['diff-simple_subtract']=((df['First_Day']-df['Last_Day']).dt.total_seconds())//3600"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Letter</th>\n",
" <th>First_Day</th>\n",
" <th>Last_Day</th>\n",
" <th>diff</th>\n",
" <th>diff_time_delta</th>\n",
" <th>diff-simple_subtract</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>A</td>\n",
" <td>2019-10-06 12:25:53</td>\n",
" <td>2019-10-04 10:10:53</td>\n",
" <td>2</td>\n",
" <td>2 days 02:15:00</td>\n",
" <td>50.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>A</td>\n",
" <td>2019-10-04 10:10:53</td>\n",
" <td>2019-10-01 08:10:53</td>\n",
" <td>3</td>\n",
" <td>3 days 02:00:00</td>\n",
" <td>74.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>B</td>\n",
" <td>2019-10-01 08:10:53</td>\n",
" <td>2019-09-23 01:24:53</td>\n",
" <td>8</td>\n",
" <td>8 days 06:46:00</td>\n",
" <td>198.0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>B</td>\n",
" <td>2019-09-23 01:24:53</td>\n",
" <td>2019-09-23 15:58:17</td>\n",
" <td>-1</td>\n",
" <td>-1 days +09:26:36</td>\n",
" <td>-15.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Letter First_Day Last_Day diff diff_time_delta \\\n",
"0 A 2019-10-06 12:25:53 2019-10-04 10:10:53 2 2 days 02:15:00 \n",
"1 A 2019-10-04 10:10:53 2019-10-01 08:10:53 3 3 days 02:00:00 \n",
"2 B 2019-10-01 08:10:53 2019-09-23 01:24:53 8 8 days 06:46:00 \n",
"3 B 2019-09-23 01:24:53 2019-09-23 15:58:17 -1 -1 days +09:26:36 \n",
"\n",
" diff-simple_subtract \n",
"0 50.0 \n",
"1 74.0 \n",
"2 198.0 \n",
"3 -15.0 "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment