Skip to content

Instantly share code, notes, and snippets.

@Skyentific
Created September 30, 2018 08:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Skyentific/4f581e23e54486080810724e48773ba3 to your computer and use it in GitHub Desktop.
Save Skyentific/4f581e23e54486080810724e48773ba3 to your computer and use it in GitHub Desktop.
Fast processing with pandas
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fast processing with Pandas\n",
"https://realpython.com/fast-flexible-pandas/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.23.4'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"pd.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read in the data to a pandas dataframe"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_time</th>\n",
" <th>energy_kwh</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/1/13 0:00</td>\n",
" <td>0.586</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/1/13 1:00</td>\n",
" <td>0.580</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/1/13 2:00</td>\n",
" <td>0.572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/1/13 3:00</td>\n",
" <td>0.596</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/1/13 4:00</td>\n",
" <td>0.592</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_time energy_kwh\n",
"0 1/1/13 0:00 0.586\n",
"1 1/1/13 1:00 0.580\n",
"2 1/1/13 2:00 0.572\n",
"3 1/1/13 3:00 0.596\n",
"4 1/1/13 4:00 0.592"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('Fast_pandas_data.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Quickly plot the data to see what it looks like"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x11cffbe48>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEBCAYAAACDu+UiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl4VOXZBvD7IYQdWQOigEFAAWVRAqICVVEKYqVW9MO2VmwttWqldrFoq1KqgtUKIiCCO4qCiIKssojsSNiXsAQIEAgkBMhG9jzfHzMZJpNZzsycycyc3L/r4mKWd855c+bMc97zrqKqICIia6kR7gwQEZH5GNyJiCyIwZ2IyIIY3ImILIjBnYjIghjciYgsiMGdiMiCDAd3EYkRke0istDNe7VFZLaIJIvIZhGJNzOTRETkH39K7qMAJHl473cAzqtqBwATALwWbMaIiChwNY0kEpHWAIYAeAXAX9wkGQpgjP3xXACTRUTUy/DX5s2ba3x8vF+ZJSKq7rZu3XpWVeN8pTMU3AFMBPAsgIYe3r8SwAkAUNUSEckC0AzAWU8bjI+PR2JiosHdExERAIjIMSPpfFbLiMg9ANJVdau3ZG5eq1RqF5GRIpIoIokZGRlG8kdERAEwUud+K4B7RSQFwBcA7hCRT13SpAJoAwAiUhNAIwDnXDekqtNVNUFVE+LifN5VEBFRgHwGd1V9TlVbq2o8gOEAVqnqr12SLQDwiP3xMHsaTjdJRBQmRuvcKxGRsQASVXUBgPcBzBSRZNhK7MNNyh8RhVFxcTFSU1NRUFAQ7qxUO3Xq1EHr1q0RGxsb0OclXAXshIQEZYMqUWQ7evQoGjZsiGbNmkHEXdMahYKqIjMzEzk5OWjXrl2F90Rkq6om+NoGR6gSkUcFBQUM7GEgImjWrFlQd0wM7kTkFQN7eAR73KtVcE9Oz0XK2bxwZ4OIKOQCblCNRne++QMAIGX8kDDnhIgotKpVyZ2IyCwpKSm4/vrr/f7c6tWrcc8994QgRxUxuBNRtVJSUhLuLFSJalUtQ0SB+/e3e7HvVLap2+xyxWV46WfXeU3z6aefYtKkSSgqKsJNN92EqVOnolGjRhg1ahQWLlyIunXrYv78+WjZsiUyMjLw+OOP4/jx4wCAiRMn4tZbb8WYMWNw6tQppKSkoHnz5njvvfcwYsQI7N+/H507d0ZKSgqmTJmCnTt3Ys+ePZgwYQIAYMaMGUhKSsKbb77pNY9HjhzB/fffj+nTp+Oll17C+PHj0a1bN9xwww2477778OKLL+KFF17AVVddhQ4dOiA3NxfDhg3Dnj170LNnT3z66aemN1yz5E5EESspKQmzZ8/G+vXrsWPHDsTExOCzzz5DXl4e+vTpg507d6J///6YMWMGAGDUqFF45plnsGXLFnz11Vd47LHHHNvaunUr5s+fj1mzZmHq1Klo0qQJdu3ahRdeeAFbt9qmzho+fDgWLFiA4uJiAMCHH36IRx991GseDxw4gPvvvx8ffvghevXqhf79+2Pt2rXIzs5GzZo1sX79egDAunXr0K9fPwDA9u3bMXHiROzbtw9HjhxxpDETS+5EZIivEnYorFy5Elu3bkWvXr0AAPn5+WjRogVq1arlqLfu2bMnli9fDgBYsWIF9u3b5/h8dnY2cnJyAAD33nsv6tatC8AWaEeNGgUAuP7669GtWzcAQP369XHHHXdg4cKF6Ny5M4qLi9G1a1eP+cvIyMDQoUPx1Vdf4brrbMenX79+mDRpEtq1a4chQ4Zg+fLluHjxIlJSUnDttdciLS0NvXv3RuvWrQEAPXr0QEpKCvr27WvacQMY3IkogqkqHnnkEYwbN67C62+88YajGiMmJsZRj15WVoaNGzc6griz+vXrV9iuJ4899hheffVVdOrUyWepvVGjRmjTpg3Wr1/vCO69evVCYmIirr76atx11104e/YsZsyYgZ49ezo+V7t2bcdj5/ybidUyRBSxBgwYgLlz5yI9PR0AcO7cORw75nk684EDB2Ly5MmO5zt27HCbrm/fvpgzZw4AYN++fdi9e7fjvZtuugknTpzArFmz8NBDD3nNX61atfDNN9/gk08+waxZsxyvtWnTBnPmzEGfPn3Qr18/vPHGG44qmarC4E5EEatLly54+eWXMXDgQHTr1g133XUX0tLSPKafNGkSEhMT0a1bN3Tp0gXTpk1zm+6JJ55ARkYGunXrhtdeew3dunVDo0aNHO8/+OCDuPXWW9GkSROfeaxfvz4WLlyICRMmYP78+QBsVTMtW7ZEvXr10K9fP6SmplZ5cK9WE4fFj14EgIOYiIxKSkpC586dw50N05WWlqK4uBh16tTB4cOHMWDAABw8eBC1atUCANxzzz145plnMGDAgLDm093xNzpxGOvciajauXjxIm6//XYUFxdDVfHOO++gVq1auHDhAnr37o3u3buHPbAHi8GdiKqdhg0bul3DuXHjxjh48GCF1zIzM90G+pUrV6JZs2Yhy2OwGNyJyCtVrdYzQzZr1sxjw2woBVtlzgZVIvKoTp06yMzMDDrQkH/KF+uoU6dOwNvwWXIXkToA1gCobU8/V1VfckkzAsDrAE7aX5qsqu8FnCsiigitW7dGamoqMjIywp2Vaqd8mb1AGamWKQRwh6rmikgsgHUiskRVN7mkm62qTwWcEyKKOLGxsZWWeaPo4DO4q+1+LNf+NNb+j/doREQRzFCdu4jEiMgOAOkAlqvqZjfJ7heRXSIyV0TaeNjOSBFJFJHEUNzmfbDuKJLSzJ21jogoGhkK7qpaqqo9ALQG0FtEXGeo/xZAvKp2A7ACwMcetjNdVRNUNSEuLi6YfLs1duE+DH5rrenbJSKKNn71llHVCwBWAxjk8nqmqhban84A0BNERBQ2PoO7iMSJSGP747oA7gSw3yVNK6en9wJIMjOTRETkHyO9ZVoB+FhEYmC7GMxR1YUiMhZAoqouAPC0iNwLoATAOQAjQpVhIiLyzUhvmV0AbnDz+otOj58D8Jy5WSMiokBxhCoRkQUxuBMRWRCDOxGRBTG4ExFZEIM7EZEFMbgTEVkQgzsRkQUxuBMRWRCDOxGRBTG4ExFZEIM7EZEFMbgTEVkQgzsRkQUxuBMRWRCDOxGRBTG4ExFZEIM7EZEFGVlDtY6I/CgiO0Vkr4j8202a2iIyW0SSRWSziMSHIrNERGSMkZJ7IYA7VLU7gB4ABolIH5c0vwNwXlU7AJgA4DVzs0lERP7wGdzVJtf+NNb+T12SDQXwsf3xXAADRERMyyUREfnFUJ27iMSIyA4A6QCWq+pmlyRXAjgBAKpaAiALQDM32xkpIokikpiRkRFczomIyCNDwV1VS1W1B4DWAHqLyPUuSdyV0l1L91DV6aqaoKoJcXFx/ueWiIgM8au3jKpeALAawCCXt1IBtAEAEakJoBGAcybkj4iIAmCkt0yciDS2P64L4E4A+12SLQDwiP3xMACrVLVSyZ2IiKpGTQNpWgH4WERiYLsYzFHVhSIyFkCiqi4A8D6AmSKSDFuJfXjIckxERD75DO6qugvADW5ef9HpcQGAB8zNGhERBYojVImILIjBnYjIghjciYgsiMGdiMiCGNyJiCyIwZ2IyIIY3ImILIjBnYjIghjciYgsiMGdiMiCGNyJiCyIwZ2IyIIY3ImILIjBnYjIghjciYgsiMGdiMiCjCyz10ZEvheRJBHZKyKj3KS5TUSyRGSH/d+L7rZFRERVw8gyeyUA/qqq20SkIYCtIrJcVfe5pFurqveYn0UiIvKXz5K7qqap6jb74xwASQCuDHXGiKh6WLb3NP4wMzHc2bAcIyV3BxGJh2091c1u3r5ZRHYCOAXgb6q6N+jc+SH1/MWq3B0RmeQPM7eGOwuWZDi4i0gDAF8B+LOqZru8vQ3AVaqaKyJ3A/gGQEc32xgJYCQAtG3bNuBMu3M6q8DU7RERRTNDvWVEJBa2wP6Zqs5zfV9Vs1U11/54MYBYEWnuJt10VU1Q1YS4uLggs14pj6Zuj4gomhnpLSMA3geQpKpvekhzuT0dRKS3fbuZZmbUF8Z2IqJLjFTL3ArgYQC7RWSH/bXnAbQFAFWdBmAYgD+KSAmAfADDVVVDkF+PGNuJiC7xGdxVdR18xE5VnQxgslmZIiKi4FhmhGoN1ssQETlYJrgzthMRXWKZ4E5ERJdYJrgLm1SJiBysE9wZ24mIHKI+uH/+43H87qMt4c4GEVFEifrg/ty83Vi5P50ldyIyzcEzOTiWmRfubATFr4nDIhnr3InILAMnrAEApIwfEuacBC7qS+7lWHInIrqEwZ0sKz27ADf+ZzkOnskJeBubj2TifF6R4fRFJWX4w8xEHApin1by0vw9eG7ernBno1qyTHAncrU86QzO5RXhw/UpAX2+rEzxf9M34VfvuVu+wL1dqRewbO8ZjJ6326995RQUIzHlnL9ZjHgfbzyGz388Ee5sVEuWCe5VO00ZVQflp9T+067LF5jvT59vx7BpG5GVX2z6tn8+ZT3mbUs1fbuAbR2FP8xMxMWikpBsnwJnmeDuL1VFUUlZuLNBBADYn2arxskrNC9Ifrf3NHaeuIAdJy7gL3N2mrZdZ/9dth/L9p7B4t2nvaaLH70I/5i7C/lFpSHJB1VWbYP7q4uTcM2/ljDAV5GX5u/Bl4nRc3seP3oRnvezaiUYsTVtjUbFpeadjyNnbsXQKetN216wZieewLQfDoc7G9VGtQ3uszYfBwAUmfhjokuW7jmNa/61xFFS+3jjMfx9bnQ1rM2uwotRrRjbT9HqhQ3+3qpOtQ3uFDobks9i3JIkFJWU4eSFwBYuf/j9zXhq1jb8Zc4OPOdSgr7rzR/w7Fzj1Qxzt5oXpEvLFB+tP4rCEvfVC6ezA1vLN7Y8uEdB8CsrU2w7fj6gz5rZNlZapvgy8QRKy9jg5g6DewDO5hZi/JL9PKnc2HrsHH753mYcy/Qd1JPTczDsnQ0V6pn/+OlWvPvDYaw9dBYLd6Vh3raT+PzH447395zMwqH0XMxJNN5AWFxq/Hs6n1eEkxfyPb7/9faTGPPtPkxZlez2/admbQdga9Mpd+FiEebvOFkp7auLk/DQ9E0A/FsDuKC4FPGjF2Hu1tA0kvry4YYU/GLqBqw5mBGW/Zebtdl2NzhzY0pY8xGpfI5QFZE2AD4BcDmAMgDTVfUtlzQC4C0AdwO4CGCEqm4zP7vmCSYsPzdvN5bvO4Ob2zfDT64xd6HvaJeRY7xP+LjF+5F47Dw2Hs5EhxYNEFNDsGTPaSzZ47lxLj0nsJKxUTeNW+m1aiS3wNabxZ9eLU/O2ob1yZm4sW0TtGlaz/H69DVHKqWdtPIQzl8sxpw/3Oxxe2dzCwEAE5YfxLCerQ3nwyzlffi9XQQ90aB+eRVl2scfnLtofg8jKzBSci8B8FdV7QygD4AnRaSLS5rBADra/40E8I6puTTA+XZvxb4zGDp5HcoMlKwFwNGzeY4fjBHlP/4y9r8EYCulLtqVhpIgqhRue2M1+v33+6DykZyeU6FB0vnrGbckyXGHUFqm+O/S/TjnZnCS2XXeKWfzsD7ZtlZ8eZXLqQv5HgPjsr1n8OPRyOjvnl9UivScAryyaB9OnPNwJ8afQMQysoZqGoA0++McEUkCcCWAfU7JhgL4xL4o9iYRaSwireyfrXKjvtiOvKJSXCwuRYPavqfPuf2N1agVUwMHXxlsaPtrD4X3djTSfLsrDU9/vh3/GNQJ7ZrXN/y5lfvTTcvDyQv5uPPNNRhxSzzG3Htdpfff/cFWSn5ucGesOZSBqasP4+jZPLzz656Gtn/hYhHqxMb4na/b3lhd6bVbxq/yeztGHTidgzqxNXBVM+Pfgyf3TV2P/adtpfRNR87h2z/19ZjWXaWSsvATVn7VuYtIPIAbALgO2bsSgHOrVar9tahhtCErKS0bkVzV/v3+dKQH2KgXqEz7Xc+ZKt6vs3O5tlL4piOZjny4VmMXl1T84vL86HPdY+xyv0aquuMt1vkze0aZ/VTNyi+uFEB/OnENfvL6ar/zBtiC8VdbUx13P+WBHUCF9qUvthhroP73t/s8vpddUIzXl+0P6m6PvDMc3EWkAYCvAPxZVV2H7Lm9cLvZxkgRSRSRxIwMc0u//s4tE2ihItgRhKrqVxWQvx79aAvun7bBtO0VFJdi6R7vN2Dlhz4Sqqn2n87BTa+u9DofTM0athyXlhkLLOV/1dZj5/H1jlMAbF07Z2465vEz/jSQOu/DiF++Z2uEzS0swYy1levtA7VwVxr++uVOTPbQWOyvjzakeHzvtSX7MeX7w1iw85TjtYBHuUbAeReJDAV3EYmFLbB/pqrz3CRJBdDG6XlrAKdcE6nqdFVNUNWEuDhzGyIzci4FTH++6qAmHAvgnJq56RgSXl4R0omlTpzzv6HLk1cWJeHxT7d5nfekhj1YqoZ+Ajd3Uzt/s/0kfjZ5XYXXvF2Ea9awnfb+9KIpt/PEBcfj2VuOe0wXyiqJ1POXvt8VSeZVbV2wH7NQFj7KFdrbNkqc7gjSsvy786vqab5HfPgj/vZlaEb6hoLP4G7vCfM+gCRVfdNDsgUAfiM2fQBkVXV9u7uRfc5fvaeZAav6or/m4FkAtkbcaJB63taQll3gOViGu+T+8iLPt//OPlh/FHMSTyA2xpZjo1UCgfxZ245fwIbks/5/0F9mHvIgvj/Dn/RzF/lFpY5FM0rL1OP4gkBl5RcbOg+S0rKx+kCGoe6n76w+HBEjcY2U3G8F8DCAO0Rkh/3f3SLyuIg8bk+zGMARAMkAZgB4IjTZ9ayGjyJj+eT75QItYQYfv6LnFvLHo+eMVUPZD2Y0/GXPzt2FmvYBQyUmN558f6BiKfqXLnX03o6l6+lYWFKKO95YHZbGe6O/jaooN//+k0RHG8JTs7bh2n8tdbxnRrfK7v/+Dn+evcNnusFvrXU83nsqq9L7O09cQPzoRdhzMguvLd2P8Uv2B523YPkM7qq6TlVFVbupag/7v8WqOk1Vp9nTqKo+qartVbWrqiaGPusuqqjO3Sz+1sn6a32QpcaC4lI8+O5GbDt+wWdae61M2HpHnM2tXL/uLSeX8hvcfp2rBTYdycSjH3pfy/f+d4y3hZw4dxFHzuZhzIK9Aecv1JwPn+Gz2eXYP+tjSop1Tuexx/EPQf6WFu7yr5JhyKR1lV5bvu8MAGCViT3AgmWZEaqBfr1WXeQj2LpBf6pYuMQhkOnmAhMMo4ffzEFB0XDn5c6U75PD3u2y/HuoEUE/BesEdy9ROsdLfbG/nEdIbjySafhzR8/mIS0r39Q7hvjRizDWS3czs6kqxi/Z7/a2NBwGv7UWpwIYJRlNQn2HZ6YDp3NQUOy9TtyMgkB5IC/fVmmZIj0n9I3A3pT/riPp+7JOcHd67BpAn/96T6X0vko8uYUlbkcrOm97sx/B/fY3VuPmceYPXvlg/VHTt+nObz9KRGFJGab9cBi/mOq+eiHQC1degF3gktKycVuAfbrNsPtkFqZ8n4w9J41f7I57mHPHNSYEcih9lV47PL8Yj374o8f3gwlL2QXF+OnENYbvGN39/oyeP4czcgP+bKhE4l2PdYK7mzOz/LUML/OReCpJXP/SMjzw7kYzsua3L348HlVrcAZbWHl1cZLhtG+vOlThebhnUXx92QHc83blOlhPBr21xnciOJUE/ciLrwbikjLF9wc8N9AGE6Au2geEbamCpQJDPYgwkCqecF9c3LFMcPcm0FtB5z7N5d747oDjcTDfp7ccjZ63GwMn2oLAukNnsdTLRFqR4Kz9ltjITJDu5BYYL7kbaeAFvP9Aw/lDvGhwVOxx+1wuvto+cpyOnVkVAv78XsJxKN0dErPaHuZsOYF2zy32+3Pl+4+gWhnfc8tEC+eukK5ftJkH3HkASSBBwuhHyrf96/dt3elSxg/xaz9VeY79b/lBALY2iFvaN6v0fq6JS8eZ4evttul3zTwvzP5RP/25bergwxnex0M4TxEQLF/nc8LLy/Fwn3jHc+c/2d+/3/0FJPzF32e/CnBBGcedVuREd8uU3L0dUncnnhmlt2BKC5F0hXfHXe8PI8fsE5ch+c/N242LPoK7kaM4c9Mxt3WtnpzNLcJqD1UQ3obFRwoze8H4y9O5eTa3CBNWHDRl22b8fZH0G9qVamt38TSraF5hiakdO4ywTMndmevV09vVNJgTJJgLhNm9Zqb+6kbc3bWVadsMtPdBhsvntqQEtmKPqxe+2YO6fszK+GCY2kuqE189Y9zx/nPzcz4ep99QbkEJ0Mjv7JimvOfc/tOu027Z3DB2OYpKy/y+Aw+GZUruzvwpFURKgPZ/3xV3/sRn2yqUbEORtVCVlIwex/wAgokZ+zUikEDnfXuVS4BJadmG1iioKmNM7oY7YcVBx2AgI5ynHLlrgrGG6lDzdE6Fo+E/qoN7hauku94yfpbYn/96N05eyDfcWh5Ug2oIAmV2kDNWVlR1QcTsoB2oYAbC/GVOaCeU2nMyC4PfWovJ35szY6PZzLhQLtqVht9/Ynxwu5nHwt8ZKZ1jj/Nd4sr9xi9OoRaVwT0jpxD5RaWYa3AdTaN17rM2H8ffQzzrW8DTmvopkGvHtB8OIzndVvp32yMhcgqNIRHOFZB8XezLB2uV1+1GivJsv77M1ovsTLatWs7TRHMzNx6z/S1Bnktfb3f/288vKsW8bal+X6j9nQN/0MS1jtkznc+bQGYaDZWoDO69XlmBYdM2OKaadeXP9xqKEnReYQn6/XeV22lyNx05F7L9Tl0d+Ex0xaVlGL9kP4ZO9t1n2++8R1DDlztJadlYtvc0LkbIHYR35gWPnILiCouTB9S/28Prnrq35hSWYISXgVSArYrrTHZBhdHgrnl7Zrb7QtjYhfvwlzk7Hb8zo1zbiso9MG0D7nCzmhaACscuEkVlcAeAvaeyPQaZQpcWa29Dgv/1TeXRq0Z5+jHsPpmFE+fy8d9lB9y+HyrO9ZWn/Jwbe942W0nIn9WJrGLwW2vxh5lbw50Nr+ZtO+kzjb+hueuY73DDf5Zj6Z7TFc7lUF+Lfc002umFpbjp1ZXo/cpKt+97a1MrX4XMrMC7JeU8jniYntvXtfD9dVUzetyTqA3uwKV1MQF4PbO9nazu5mf29KX193MB52OZeXh54T5DJaKikjJkuazinpTmvuU9FP7x1W7Daa1ePROJlu4tH8hmbugtKinD459uNTRPuVHvrzuKjYeNT81hVf9ZWHXzPrkT1cHd2X8WGR/CDgR2c3vcZQV4TwNIyoPfmexCvLfuqKMe25s/froV3cd+V+E15zmkI0Ek9Su2GuODX9yfuccy87DFqe73XT8Xi/DV9dWf7/4/C/fhrxG2YtHCXafQ+YWlpvdqimSW6efurZQb7qBk5EKy0s95oJ1HylYVlthDx1tVwwEDo1BdGwTH+blYRHp2Ado1r+/XZwJV3ujqj2DPvXGL9yO/uBQZOYVo07SefZuhP6EX7jqFz3+8tBxjSWkZ9pzKRo82jUO+b8uU3L2pijUhvXE3P0iww5T7Gagiih+9qFJVjxnCfbE0k6cRhZHkpxO99+F2t8Skvz7eeMzrhFzRfmG3L5tb5UtBPjVrO9YnX6qimrDiIH4+ZT12V0GvJyNrqH4gIuki4rblUURuE5EspyX4XjQ/m4FbczADe04ar7sWMb+Hd1kY48fhs8aH7Fc3IkDXMcvCnY2gmT0a1585ycO9SIZR5XNPhXsMWHlbREaufx0eAmGk5P4RgEE+0qx1WoJvbPDZMs+uVGOzCIZSOOcJ8bS27JnsgmpV/+iJa8+qcAnmTm67wZkyI8nZvMBXrvJ2PTlkb9866bKIy6XgHt6Vq8pnNa2Ka6KRNVTXAAjf6I4gBBJUQ3HQw1m4ifEQ3G96dSWe+GxbleQhrzAyLyKu38vkVZE5+rOqVOVpuuagfwt/G81beacH12myQ1GTuMKPqRJcRURwN+hmEdkpIktE5DqTthlVSss06EWpQ8HbHfaq/enYfvy821trM8+9Z+dGVs8JT7YeM2eSs1AKV0GhJJx1iyZwd9iCPZans0NftRIMM4L7NgBXqWp3AG8D+MZTQhEZKSKJIpKYkeHflbsqBVKPOOX7ZPzqvc1Yd8hggK+iRklP1TLl7pu6AZ9tPu41TbDC0bOH/Pepy3TNzg6ecd92882OU6HKTkhMWZUcEY3oVXGNDjq4q2q2qubaHy8GECsizT2kna6qCaqaEBcXF+yug+Pl6C7bW/l2y9NUngCQev6io9SeeCyyarBq1AA+3pBSoceQ68Vr0spDrh/zyt9CnPPl5bu9pxE/ehEyw9yDCbCNJI4URtswzeipFD96EY66GXXp7jWrKD/n520/idvfWI3i0jJTAuzi3WkmbCU0gg7uInK52JvXRaS3fZsRMTztix9P+L0a+YbDmXhyVuW66EETPQ8o6vva99hsH0AyY82RSnX9n2xMqfSZqupNmJyei5cW7HWs7ANUvh11N4DF3c3LD/Z60qLSsoCD80j7MP8DUbRGbCQxq1pmXQRWIXrjXCAxMmWIt/a2kxfy0fGfS/ChCYvLB9puVRW9jIx0hfwcwEYA14pIqoj8TkQeF5HH7UmGAdgjIjsBTAIwXCOkf9RYg0P/zeRubpY5BmevDIUS+yx1F5z6uwd6RNKyLlWv/PZj41Oz+nuBJfImkLYRd+dg+XKLVuVzhKqqPuTj/ckAJpuWoypQVFIW0u6Jv5yx2VC6/KJSdH5xKab88kafaa97cSlGD+6Eh2+O9ysv5ee087ltxgUv9dxFHDqTY6gR2bVbGkU2EWDD4cgq2X+y0XN7gC97TmZZusrJE8tMP+CJuyv2Nf9aEoacVFYe9P633PfskXlFpXhh/l6/g7v7leJ9c1fYdn3pnrfXRUw/cfKTlwt8SakaLqBUlbFBTMIVqiUXI6N+wjPLB/dg5RWWVKiOCIkqOEnKg/WhMzk4b8KUBIrgBgBF0irxVFFVD9EPBefzK8bDug/hVBVHmMHdh99/kogNIZi+VEQcAff8ReOj9TYdyUSfq5sFvN9IWWuSIpev7rPRYOORTLz53QH8tm87xMZE3hRa0TSIKWI5LxodiFAEdlf0aWqjAAAP00lEQVTOJWlfMwD+3o+GTGf+lpTdnXzOS4hFSJu5ZVR1OH3dy0IyFojtAIBJq5IxfPom1AxRyT3Hw1KCkcLyJfdomz/F1wyAOYUlmL/DeCt/oBe37/aervTaK4svzZnvazUdimzZHpbBsxpPay4AwZeevwxqgZMI6AoZ7QqKQ9vgF0wJNtDyxKgvdhhO+7Z9vpTdJ7OQa3DpsXWHzuI9H0uEhXt2vepq2/HQT5FgkYK7T/tc1oAoMWHq5Ehi+eC+ys9FMKrK+byiKu///dL8vYbSTVhxMMQ5oUCZ0RjuS3VcRxcAfvOB94W7o43lg3uoBVpw//PsHZWW7Qs159Xkw+2hGZvCnYWIkZaVjzQ/FzQPpQ1RNnrVLFXRvlaVLF/nHskeqeKSgtEJk6rLbXmkuHncqnBnoQLWuIUee8tEgT2nImfyKV+KLFanSKERSXcRVhUVs0JWdxejqH7SaCNoYhTMa07k6kQVV3NGOgb3IA2fzrpjokgQTXemrJYhIrKgotLQ3/EzuBORJURTR4C/f7kr5PtgcK9Gdp64EO4sEIVMNK0bUFIFowAZ3ImILIjBnYjIgowss/eBiKSLiNuFC8Vmkogki8guEfG9rBARkcm+3Xkq3FmIKEZK7h8BGOTl/cEAOtr/jQTwTvDZIiLyz5vLOSeSM5/BXVXXADjnJclQAJ+ozSYAjUWklVkZJCIi/5lR534lgBNOz1Ptr1UiIiNFJFFEEjMyMkzYNRERuWNGcHfX/8htPx9Vna6qCaqaEBcXZ8KuiYjIHTOCeyqANk7PWwNgywYRURiZEdwXAPiNvddMHwBZqppmwnaJiChAPudzF5HPAdwGoLmIpAJ4CUAsAKjqNACLAdwNIBnARQCPhiqzRERkjM/grqoP+XhfATxpWo6IiChoHKFKRGRBDO5ERBbE4E5EZEEM7kREFsTgTkRkQQzuREQWxOBORGRBDO5ERBbE4E5EZEEM7kREFsTgTkRkQQzuREQWxOBORGRBDO5ERBbE4E5EZEFRF9z//e3ecGeBiCjiGQruIjJIRA6ISLKIjHbz/ggRyRCRHfZ/j5mfVZs2TeqFatNERJbhM7iLSAyAKQAGA+gC4CER6eIm6WxV7WH/957J+XQYdP3lodo0EZFlGCm59waQrKpHVLUIwBcAhoY2W57VEAnXromIooaR4H4lgBNOz1Ptr7m6X0R2ichcEWljSu7cqMHYTkTkk5Hg7i6cqsvzbwHEq2o3ACsAfOx2QyIjRSRRRBIzMjL8y+mlbQT0OSKi6sRIcE8F4FwSbw3glHMCVc1U1UL70xkAerrbkKpOV9UEVU2Ii4sLJL8suRMRGWAkuG8B0FFE2olILQDDASxwTiAirZye3gsgybwsVsSSOxGRbzV9JVDVEhF5CsAyADEAPlDVvSIyFkCiqi4A8LSI3AugBMA5ACNClWGW3ImIfPMZ3AFAVRcDWOzy2otOj58D8Jy5WXOPJXciIt+iboQqS+5ERL5FYXBndCci8iXqgjtjOxGRb1EX3FlyJyLyLeqCO2M7EZFvURfcWXInIvKNwZ2IyIKiLrgztBMR+RZ9wZ3RnYjIpygM7ozuRBTd+nVsHvJ9RF1wJyIi3xjciYiqWFXUQDC4ExFZEIM7EZEFMbgTEVlQtQjuH4xIcDxuUNvQFPZEVE10bNEg3FkIiWoR3O/o1NLxuGaMYGCXlvjHoE5hzBERRYoGdaxZ4DMU3EVkkIgcEJFkERnt5v3aIjLb/v5mEYk3O6PO1j57e8CfrVmjBqb/JgH33XAlAKBFw9oBb6tB7ZoYfP3lXtNUVamg0+UNq2Q/ROS/hKuaVPk+fQZ3EYkBMAXAYABdADwkIl1ckv0OwHlV7QBgAoDXzM6oszZN6/mRtm6F52880A0AULum7U/v1rpxwPnY8++f4p1f9/Sa5q8Dr6n0Wq2YGrjt2riA9+vO0j/3r/Ra3w6VB0o0b1DL1P1S1frxnwOC+vxlBkupv7qpbVD7iUYzfpPgO5HduF90BQA8N7gTrr/yMp/pH/9J+4DzFSgjJffeAJJV9YiqFgH4AsBQlzRDAXxsfzwXwACJkKGktWIq/om3XdsCANCkfi3Me+IWvDW8h89tPHPnNegV7/nKe0enFh7f69iyIXq3a1rhtWsvb+izxO/s131sP7Txv+iKhk5tBr6qlm5u38zwPpy9dn9XvP3QDZVer1crxvF47bO3V9h/9zaBXyTDKTLOUuNaNKwT8GdrCPCnOzr6TNfystp45b6uAe/Hl4MvD/YrfffWjTy+53xOBqtpfVvBp0XD2vjv/d08prvvhivxUO+2WPDUrfh9v6vx8aO98b8HumP5M5ULWAO7tETK+CG4s0vLCq9XxWlnJLhfCeCE0/NU+2tu06hqCYAsAIFFFoP6X2Mr+f7hJ1cjNsZ2qNrH1a+UrnUTWyl/9sg++GJknwrv3di2Cer7aGD9592dMerOjvjy8Vuwa8xAt2mmP9wTY37WBZ0ub4jNzw/AX++6VFpvH9cAkx+6AaMHd3LcLXRo0QA3tr10sZj3xC1IGT8Ei5/uh6vtf8OATi1Qu2YNdG/dCC//vCtSxg/B8N5tK+Thj7e1x6Kn+zpKc73jm+K1+7tixV/647G+7fDbW9vhmTsr3jm0j2uAb5/qi6cHdESrRpcCReN6sfhZ9ysw7dc34v96tcXPul+BlPFDKqxZ+8GIXgCAXvFN0KZpPQzp2srx3vwnb8W3T/X1eiyD9fzdnRwXunKeLirl54QvI/tf7Xg85meuN6SX3NOtlcf3gvHsoGsNpVvxl/5Y9HTl43tn58oFi89/36fSa4CtkOLrgm/bTz9DeQKALf+80/H4reE9cPDlwfjznR29VnfWqlkx7NT0sjByndgaeG2Y50A7zX7n3LxBbfSOb4or7Od0+7j6FQoonnbx4z8HoJ/9DrdFw9r47pn+WPrn/niwVxukjB+CLq0ql8rv7X4FANtdf40agmYNauP+nq3RsWVDfGj/jXRo0QBHXr0b7z7s/s6+5WWBVwcbJarqPYHIAwB+qqqP2Z8/DKC3qv7JKc1ee5pU+/PD9jSZLtsaCWAkALRt27bnsWPHAs54TkExvj+QgXu7X4FDZ3LwwfqjGD24M2rWEExfcwQ92jbGxcJS9O3QHI3qxXrd1pTvk3FTu6bo1OoyvL3qEFo3rotbOjRH+7jK9eXfbD+JNQczMPC6lhh0vecf/KkL+cgrLEHHlpfqwguKS/H8vN14/YHuiAlipe9zeUWoE1sD9WoZbwiaty0V5y8WY9iNrR3H49SFfDz8/ma8/dCN6HKF51vLJbvTcGWTuujS6jI8//VuPHV7R7RtVs/xN5WpOvLyzfaTqBNbAw3rxOLr7ScxrGdrnDh3ET+/4UqUqWLSykO4qml9xNQQHErPxWV1a+K/Sw/ghXu6oEHtGNzRqSXmbk1FbIwgrmFtjPpiB95/JAG3dmiO2jVrIK+oFIMmrkHq+XyM+0VXPNS7LeZtS0VhSRl6xTfB3W+tw7CE1vjn3Z2RnlOIOYkncDanEL3bNcXf5+7Cuw/3xO7ULHy9/STW/eN2iAhOZxWgbmwMGtWLxcJdp7ArNQvDerbGwAlr0KFFA4zsdzUe7NUGv5i6Hi0vq4Mle07jydvbo1vrxkhOz8WTt3fA+bwipJ7Px9rkDFzdvD6y8otRQwRXNauP1QfS0bR+Lazan45HbolHwlVN8O6aI2gfVx8PJrTBi/P34pc3tcXMTccwa/NxLH+mP95elYwFO0/hqz/egm+2n8TYodc5RjW+ufwgJq08hPd+k4B+1zTHyfP5WJd8Fhk5hbiqWX0M69na8HlxOqsAfcatxJBurTB6UKcKVZ5L96Qh9Xw+/vfdQUwc3gO3XRuHWjE1cPBMLjYfzcRdXVqiVaO6mLPlBCDAgwltHJ9VVYxfuh9xDWpjeO+2mP7DYbRv0QDn84ow4tZ2WH0gHZuOnEP31o0wuGsr5BeV4lRWPvadykZ6TiGW7E5DQnxTjLglHpc3qoOC4lJ8vz8dVzSui6vj6uOHgxm4tmVDdGzZEAfP5KBjiwaO45NbWILYGEHtmjGO39zowZ1Qp1YMzmQVYPWBDLz+3QEs/FNfXNOyIcrKFKezC3BF47qVjo+q4tGPtqBXfFN0bNEACfFNHSV8T77enoqbr26OyxtVvMtavu8MWjSsjQOnc3BP91Z+/X6dichWVfVZh2QkuN8MYIyq/tT+/DkAUNVxTmmW2dNsFJGaAE4DiFMvG09ISNDExERDfwwREdkYDe5GqmW2AOgoIu1EpBaA4QAWuKRZAOAR++NhAFZ5C+xERBRaPu8LVLVERJ4CsAxADIAPVHWviIwFkKiqCwC8D2CmiCQDOAfbBYCIiMLEUKWPqi4GsNjltRedHhcAeMDcrBERUaCqxQhVIqLqhsGdiMiCGNyJiCyIwZ2IyIIY3ImILMjnIKaQ7VgkA0CgQ1SbAzhrYnasgselMh4T93hcKouWY3KVqvqceTBswT0YIpJoZIRWdcPjUhmPiXs8LpVZ7ZiwWoaIyIIY3ImILChag/v0cGcgQvG4VMZj4h6PS2WWOiZRWedORETeRWvJnYiIvGBwJyKyIAZ3IiILYnAnIrIgBnciIgticCcisiAGd4p6IjJGRP7m5f2fi0iXALdd4bMiMlZE7gxkW0RVicGdqoOfAwgouLt+VlVfVNUVpuSKKIQY3Ckqicg/ReSAiKwAcK39td+LyBYR2SkiX4lIPRG5BcC9AF4XkR0i0t7+b6mIbBWRtSLSycM+3H32IxEZZn8/RUReFZGNIpIoIjeKyDIROSwijztt5+/2fO0SkX+H/OAQgcGdopCI9AQwHMANAH4BoJf9rXmq2ktVuwNIAvA7Vd0AYAGAv6tqD1U9DNsw8z+pak8AfwMw1d1+PHzW1QlVvRnAWgAfARgGoA+Asfa8DgTQEUBvAD0A9BSR/sEeAyJfaoY7A0QB6Afga1W9CAAissD++vUi8jKAxgAaAFjm+kERaQDgFgBfikj5y7WDyEv5vncDaKCqOQByRKRARBoDGGj/t92ergFswX5NEPsk8onBnaKVu0mRPgLwc1XdKSIjANzmJk0NABdUtYdJ+Si0/1/m9Lj8eU0AAmCcqr5r0v6IDGG1DEWjNQDuE5G6ItIQwM/srzcEkCYisQB+5ZQ+x/4eVDUbwFEReQAAxKa7l305PhugZQB+a79jgIhcKSItgtgekSEM7hR1VHUbgNkAdgD4Crb6bgB4AcBmAMsB7Hf6yBcA/i4i20WkPWyB/3cishPAXgBDvezO9bP+5vU7ALMAbBSR3QDmIriLBZEhnPKXiMiCWHInIrIgNqgSwdZvHsADLi9/qaqvhCM/RMFitQwRkQWxWoaIyIIY3ImILIjBnYjIghjciYgsiMGdiMiC/h+hvHhciks2VAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.plot(x='date_time', y='energy_kwh')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parsing dates"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look at the data types"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"date_time object\n",
"energy_kwh float64\n",
"dtype: object"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, the date column has been set to 'object', a sting rather than a date format."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df.iloc[0,0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can convert this to a datatime so its easer to manage the dates"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_time</th>\n",
" <th>energy_kwh</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2013-01-01 00:00:00</td>\n",
" <td>0.586</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2013-01-01 01:00:00</td>\n",
" <td>0.580</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2013-01-01 02:00:00</td>\n",
" <td>0.572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2013-01-01 03:00:00</td>\n",
" <td>0.596</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013-01-01 04:00:00</td>\n",
" <td>0.592</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_time energy_kwh\n",
"0 2013-01-01 00:00:00 0.586\n",
"1 2013-01-01 01:00:00 0.580\n",
"2 2013-01-01 02:00:00 0.572\n",
"3 2013-01-01 03:00:00 0.596\n",
"4 2013-01-01 04:00:00 0.592"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['date_time'] = pd.to_datetime(df['date_time'])\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"date_time datetime64[ns]\n",
"energy_kwh float64\n",
"dtype: object"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas._libs.tslibs.timestamps.Timestamp"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df.iloc[0,0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the timeit decorator to see how fast the date conversion is"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8760"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('Fast_pandas_data.csv')\n",
"len(df)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.07 s ± 81.2 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit -n 10 -r 3\n",
"pd.to_datetime(df['date_time'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1.07 seconds on average to convert the dates of 8,760 rows. Not bad, but not good if you have millions of rows.\n",
"\n",
"Make this faster by specifying the date format for Pandas to use, that way Pandas doesnt need to interpret the data first."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"28.2 ms ± 2.8 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit -n 10 -r 3\n",
"pd.to_datetime(df['date_time'], format='%d/%m/%y %H:%M')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thats a time reduction of 97%"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loop over pandas data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Electricity costs vary by time of day. To work out the cost of the electricity usage we need to iterate over the dataframe and apply the appropriate time-of-day costs to each row.\n",
"\n",
"Lets assume the following costs:\n",
"\n",
"|Tariff type|Cents per kWh | Time range|\n",
"|- |- |- |\n",
"|Peak | 28 | 17:00 to 24:00 |\n",
"|Shoulder | 20 | 7:00 to 17:00 |\n",
"|Off-peak | 12 | 0:00 to 7:00 |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Naive functional approach"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('Fast_pandas_data.csv')\n",
"df['date_time'] = pd.to_datetime(df['date_time'], format='%d/%m/%y %H:%M')"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"def apply_tariff(kwh, hour):\n",
" \"\"\"\n",
" Calculates cost of electiricty for given hour.\n",
" \"\"\"\n",
" if 0 <= hour < 7:\n",
" rate = 12\n",
" elif 7 <= hour < 17:\n",
" rate = 20\n",
" elif 17 <= hour < 24:\n",
" rate = 28\n",
" else:\n",
" raise ValueError(f'Invalid hour: {hour}')\n",
" \n",
" return rate * kwh"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.45 s ± 594 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit \n",
"def apply_tariff_loop(df):\n",
" \"\"\"\n",
" Calculate costs in loop. Modifies `df` in place.\n",
" \"\"\"\n",
" energy_cost_list = []\n",
" \n",
" for i in range(len(df)):\n",
" # Get electricity used and hour of day\n",
" energy_used = df.iloc[i]['energy_kwh']\n",
" hour = df.iloc[i]['date_time'].hour\n",
" energy_cost = apply_tariff(energy_used, hour)\n",
" energy_cost_list.append(energy_cost)\n",
" \n",
" df['costs_cents'] = energy_cost_list\n",
" \n",
"apply_tariff_loop(df)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterrows approach"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"745 ms ± 42.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"def apply_tariff_iterrows(df):\n",
" energy_cost_list = []\n",
" for index, row in df.iterrows():\n",
" \n",
" # Get electricity used and hour of day\n",
" energy_used = row['energy_kwh']\n",
" hour = row['date_time'].hour\n",
" \n",
" # Append cost list\n",
" energy_cost = apply_tariff(energy_used, hour)\n",
" energy_cost_list.append(energy_cost)\n",
" \n",
" df['cost_cents'] = energy_cost_list\n",
" \n",
"apply_tariff_iterrows(df)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thats 745 ms compared to 3.4 seconds. Quicker by nearly a factor of 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Pandas .apply() method"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"231 ms ± 9.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"def apply_tariff_withapply(df):\n",
" df['cost_cents'] = df.apply(\n",
" lambda row: apply_tariff(\n",
" kwh = row['energy_kwh'],\n",
" hour = row['date_time'].hour\n",
" ),\n",
" axis = 1\n",
" )\n",
" \n",
"apply_tariff_withapply(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thats 231 ms compared to 745 miliseconds. Quicker by a factor of 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting data with .isin()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"df.set_index('date_time', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"67.9 ns ± 0.886 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"def apply_tariff_isin(df):\n",
" \n",
" # Define hour range boolean arrays\n",
" peak_hours = df.index.hour.isin(range(17,24))\n",
" shoulder_hours = df.index.hour.isin(range(7, 17))\n",
" off_peak_hours = df.index.hour.isin(range(0, 7))\n",
" \n",
" # Apply tariffs to hour ranges\n",
" df.loc[peak_hours, 'cost_cents'] = df.loc[peak_hours, 'energy_kwh'] * 28\n",
" df.loc[shoulder_hours, 'cost_cents'] = df.loc[peak_hours, 'energy_kwh'] * 20\n",
" df.loc[offpeak_hours, 'cost_cents'] = df.loc[peak_hours, 'energy_kwh'] * 12"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thats 68 ns compared to 231 ms. Quicker by a factor of over 3,000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using pandas .cut()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([12, 12, 12, ..., 28, 28, 28])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.cut(x=df.index.hour,\n",
" bins = [0, 7, 17, 24],\n",
" include_lowest = True,\n",
" labels = [12, 20, 28]).astype(int)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"74.4 ns ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"def apply_tariff_cut(df):\n",
" cents_per_kwh = pd.cut(x=df.index.hour,\n",
" bins = [0, 7, 17, 24],\n",
" include_lowest = True,\n",
" labels = [12, 20, 28]).astype(int)\n",
" df['cost_cents'] = cents_per_kwh * df['energy_kwh']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using numpy"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"771 µs ± 72.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"def apply_tariff_digitise(df):\n",
" prices = np.array([12, 20, 28])\n",
" bins = np.digitize(df.index.hour.values, bins=[7, 17, 24])\n",
" df['cost_cents'] = prices[bins] * df['energy_kwh'].values\n",
" \n",
"apply_tariff_digitise(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Storing processed data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a storage object"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"data_store = pd.HDFStore('processed_data.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save the dataframe ot the data store"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"data_store['processed_df'] = df\n",
"data_store.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read the data back in "
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"data_store = pd.HDFStore('processed_data.h5')\n",
"preprocessed_df = data_store['processed_df']\n",
"data_store.close()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>energy_kwh</th>\n",
" <th>cost_cents</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date_time</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01 00:00:00</th>\n",
" <td>0.586</td>\n",
" <td>7.032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01 01:00:00</th>\n",
" <td>0.580</td>\n",
" <td>6.960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01 02:00:00</th>\n",
" <td>0.572</td>\n",
" <td>6.864</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01 03:00:00</th>\n",
" <td>0.596</td>\n",
" <td>7.152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01 04:00:00</th>\n",
" <td>0.592</td>\n",
" <td>7.104</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" energy_kwh cost_cents\n",
"date_time \n",
"2013-01-01 00:00:00 0.586 7.032\n",
"2013-01-01 01:00:00 0.580 6.960\n",
"2013-01-01 02:00:00 0.572 6.864\n",
"2013-01-01 03:00:00 0.596 7.152\n",
"2013-01-01 04:00:00 0.592 7.104"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"preprocessed_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@ryanbehdad
Copy link

Thanks for sharing.
For storing dataframes, in addition to HDF5 and pickle, feather is also a good option.
https://github.com/wesm/feather

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment