Skip to content

Instantly share code, notes, and snippets.

@nipunbatra
Last active December 18, 2015 02:58
Show Gist options
  • Save nipunbatra/5714493 to your computer and use it in GitHub Desktop.
Save nipunbatra/5714493 to your computer and use it in GitHub Desktop.
This IPython notebook illustrates handling epoch timestamps in Pandas
{
"metadata": {
"name": "pandas_epoch"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this IPython notebook we shall learn how to use pandas to analyse timeseries CSV data with epoch based timestamps. For the uninitiated epoch timestamps represents the number of seconds since Jan 1, 1970 and is commonly used for data recording in mant scientific applications. Unix command **date** can be used to give the current time and we can also extract the current epoch timestamp as follows:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!date"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Thu Jun 6 09:01:16 IST 2013\r\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!date +%s"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1370489477\r\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now create a dummy dataset using pandas containing 10 rows and 2 columns and save it locally as a CSV."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"print pd.__version__\n",
"import datetime,time\n",
"from pandas import DataFrame\n",
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.11.0\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"start_time=int(time.time())\n",
"timestamp=start_time+np.arange(10)\n",
"column_1=np.random.rand(10)\n",
"column_2=np.random.rand(10)\n",
"df=DataFrame({'timestamp':timestamp,'column_1':column_1,'column_2':column_2})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>column_1</th>\n",
" <th>column_2</th>\n",
" <th>timestamp</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.620274</td>\n",
" <td> 0.720347</td>\n",
" <td> 1370489479</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.603379</td>\n",
" <td> 0.214927</td>\n",
" <td> 1370489480</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.508877</td>\n",
" <td> 0.562860</td>\n",
" <td> 1370489481</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 0.236682</td>\n",
" <td> 0.062259</td>\n",
" <td> 1370489482</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 0.923264</td>\n",
" <td> 0.278048</td>\n",
" <td> 1370489483</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0.105673</td>\n",
" <td> 0.246425</td>\n",
" <td> 1370489484</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 0.204324</td>\n",
" <td> 0.770854</td>\n",
" <td> 1370489485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 0.206739</td>\n",
" <td> 0.702672</td>\n",
" <td> 1370489486</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td> 0.180858</td>\n",
" <td> 0.440023</td>\n",
" <td> 1370489487</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> 0.269232</td>\n",
" <td> 0.684387</td>\n",
" <td> 1370489488</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 6,
"text": [
" column_1 column_2 timestamp\n",
"0 0.620274 0.720347 1370489479\n",
"1 0.603379 0.214927 1370489480\n",
"2 0.508877 0.562860 1370489481\n",
"3 0.236682 0.062259 1370489482\n",
"4 0.923264 0.278048 1370489483\n",
"5 0.105673 0.246425 1370489484\n",
"6 0.204324 0.770854 1370489485\n",
"7 0.206739 0.702672 1370489486\n",
"8 0.180858 0.440023 1370489487\n",
"9 0.269232 0.684387 1370489488"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Saving the dataframe into a CSV file"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.to_csv('epoch_demo.csv',index=False)\n",
"!cat epoch_demo.csv"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"column_1,column_2,timestamp\r\n",
"0.6202741436145147,0.7203473783163402,1370489479\r\n",
"0.603378652829255,0.2149274411077019,1370489480\r\n",
"0.5088772165417312,0.5628600230105181,1370489481\r\n",
"0.23668184075365228,0.062259281774838415,1370489482\r\n",
"0.923263574664376,0.27804805708797475,1370489483\r\n",
"0.10567266566500166,0.24642510043158228,1370489484\r\n",
"0.20432401714102266,0.7708544521037892,1370489485\r\n",
"0.20673911644596843,0.7026719669208518,1370489486\r\n",
"0.1808583958327319,0.4400233442122533,1370489487\r\n",
"0.2692318429010231,0.684387368168343,1370489488\r\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_2=pd.read_csv('epoch_demo.csv',index_col=2)\n",
"df_2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>column_1</th>\n",
" <th>column_2</th>\n",
" </tr>\n",
" <tr>\n",
" <th>timestamp</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1370489479</th>\n",
" <td> 0.620274</td>\n",
" <td> 0.720347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489480</th>\n",
" <td> 0.603379</td>\n",
" <td> 0.214927</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489481</th>\n",
" <td> 0.508877</td>\n",
" <td> 0.562860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489482</th>\n",
" <td> 0.236682</td>\n",
" <td> 0.062259</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489483</th>\n",
" <td> 0.923264</td>\n",
" <td> 0.278048</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489484</th>\n",
" <td> 0.105673</td>\n",
" <td> 0.246425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489485</th>\n",
" <td> 0.204324</td>\n",
" <td> 0.770854</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489486</th>\n",
" <td> 0.206739</td>\n",
" <td> 0.702672</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489487</th>\n",
" <td> 0.180858</td>\n",
" <td> 0.440023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1370489488</th>\n",
" <td> 0.269232</td>\n",
" <td> 0.684387</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 8,
"text": [
" column_1 column_2\n",
"timestamp \n",
"1370489479 0.620274 0.720347\n",
"1370489480 0.603379 0.214927\n",
"1370489481 0.508877 0.562860\n",
"1370489482 0.236682 0.062259\n",
"1370489483 0.923264 0.278048\n",
"1370489484 0.105673 0.246425\n",
"1370489485 0.204324 0.770854\n",
"1370489486 0.206739 0.702672\n",
"1370489487 0.180858 0.440023\n",
"1370489488 0.269232 0.684387"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can notice that currently the index is of integer type. To convert it into DateTime type we use tp_datetime() function, which accepts time in nanoseconds. \n",
"$$1 second = 10^9 nanoseconds$$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_2.index=pd.to_datetime((df_2.index.values*1e9).astype(int))\n",
"df_2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>column_1</th>\n",
" <th>column_2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-06-06 03:31:19</th>\n",
" <td> 0.620274</td>\n",
" <td> 0.720347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:20</th>\n",
" <td> 0.603379</td>\n",
" <td> 0.214927</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:21</th>\n",
" <td> 0.508877</td>\n",
" <td> 0.562860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:22</th>\n",
" <td> 0.236682</td>\n",
" <td> 0.062259</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:23</th>\n",
" <td> 0.923264</td>\n",
" <td> 0.278048</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:24</th>\n",
" <td> 0.105673</td>\n",
" <td> 0.246425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:25</th>\n",
" <td> 0.204324</td>\n",
" <td> 0.770854</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:26</th>\n",
" <td> 0.206739</td>\n",
" <td> 0.702672</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:27</th>\n",
" <td> 0.180858</td>\n",
" <td> 0.440023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 03:31:28</th>\n",
" <td> 0.269232</td>\n",
" <td> 0.684387</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 9,
"text": [
" column_1 column_2\n",
"2013-06-06 03:31:19 0.620274 0.720347\n",
"2013-06-06 03:31:20 0.603379 0.214927\n",
"2013-06-06 03:31:21 0.508877 0.562860\n",
"2013-06-06 03:31:22 0.236682 0.062259\n",
"2013-06-06 03:31:23 0.923264 0.278048\n",
"2013-06-06 03:31:24 0.105673 0.246425\n",
"2013-06-06 03:31:25 0.204324 0.770854\n",
"2013-06-06 03:31:26 0.206739 0.702672\n",
"2013-06-06 03:31:27 0.180858 0.440023\n",
"2013-06-06 03:31:28 0.269232 0.684387"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_2.index"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 10,
"text": [
"<class 'pandas.tseries.index.DatetimeIndex'>\n",
"[2013-06-06 03:31:19, ..., 2013-06-06 03:31:28]\n",
"Length: 10, Freq: None, Timezone: None"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, now we can see that the time is lagging the current local time and the index does not yet have an associated timezone. Thus, we first put the index in UTC to associate a timezone and then localize with our timezone (Asia/Kolkata) in our case."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_2.index=df_2.index.tz_localize('UTC').tz_convert('Asia/Kolkata')\n",
"df_2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>column_1</th>\n",
" <th>column_2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-06-06 09:01:19+05:30</th>\n",
" <td> 0.620274</td>\n",
" <td> 0.720347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:20+05:30</th>\n",
" <td> 0.603379</td>\n",
" <td> 0.214927</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:21+05:30</th>\n",
" <td> 0.508877</td>\n",
" <td> 0.562860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:22+05:30</th>\n",
" <td> 0.236682</td>\n",
" <td> 0.062259</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:23+05:30</th>\n",
" <td> 0.923264</td>\n",
" <td> 0.278048</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:24+05:30</th>\n",
" <td> 0.105673</td>\n",
" <td> 0.246425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:25+05:30</th>\n",
" <td> 0.204324</td>\n",
" <td> 0.770854</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:26+05:30</th>\n",
" <td> 0.206739</td>\n",
" <td> 0.702672</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:27+05:30</th>\n",
" <td> 0.180858</td>\n",
" <td> 0.440023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-06-06 09:01:28+05:30</th>\n",
" <td> 0.269232</td>\n",
" <td> 0.684387</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 11,
"text": [
" column_1 column_2\n",
"2013-06-06 09:01:19+05:30 0.620274 0.720347\n",
"2013-06-06 09:01:20+05:30 0.603379 0.214927\n",
"2013-06-06 09:01:21+05:30 0.508877 0.562860\n",
"2013-06-06 09:01:22+05:30 0.236682 0.062259\n",
"2013-06-06 09:01:23+05:30 0.923264 0.278048\n",
"2013-06-06 09:01:24+05:30 0.105673 0.246425\n",
"2013-06-06 09:01:25+05:30 0.204324 0.770854\n",
"2013-06-06 09:01:26+05:30 0.206739 0.702672\n",
"2013-06-06 09:01:27+05:30 0.180858 0.440023\n",
"2013-06-06 09:01:28+05:30 0.269232 0.684387"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_2.index"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 12,
"text": [
"<class 'pandas.tseries.index.DatetimeIndex'>\n",
"[2013-06-06 09:01:19, ..., 2013-06-06 09:01:28]\n",
"Length: 10, Freq: None, Timezone: Asia/Kolkata"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"References\n",
"\n",
"* https://github.com/pydata/pandas/issues/3757\n",
"* https://github.com/pydata/pandas/issues/3746"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment