Skip to content

Instantly share code, notes, and snippets.

@AllenDowney
Created March 29, 2019 20:32
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save AllenDowney/818f6153ef316aee80467c51faee80f8 to your computer and use it in GitHub Desktop.
Save AllenDowney/818f6153ef316aee80467c51faee80f8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using statsmodels lowess\n",
"\n",
"Copyright 2019 Allen B. Downey\n",
"\n",
"MIT License: https://opensource.org/licenses/MIT"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import random\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[This article](https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368) suggests that a smooth curve is a better way to show noisy polling data over time.\n",
"\n",
"Here's their before and after:\n",
"\n",
"![](https://cdn-images-1.medium.com/max/800/1*9GzHVtm4y_LeVmFCjqV3Ww.png)\n",
"\n",
"And here's their data:"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>% responding right</th>\n",
" <th>% responding wrong</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2016-02-08</th>\n",
" <td>2016-02-08</td>\n",
" <td>46</td>\n",
" <td>42</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-09-08</th>\n",
" <td>2016-09-08</td>\n",
" <td>45</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-08-17</th>\n",
" <td>2016-08-17</td>\n",
" <td>46</td>\n",
" <td>43</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-08-23</th>\n",
" <td>2016-08-23</td>\n",
" <td>45</td>\n",
" <td>43</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-08-31</th>\n",
" <td>2016-08-31</td>\n",
" <td>47</td>\n",
" <td>44</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date % responding right % responding wrong\n",
"Date \n",
"2016-02-08 2016-02-08 46 42\n",
"2016-09-08 2016-09-08 45 44\n",
"2016-08-17 2016-08-17 46 43\n",
"2016-08-23 2016-08-23 45 43\n",
"2016-08-31 2016-08-31 47 44"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('Economist_brexit.csv', header=3, parse_dates=[0])\n",
"df.index = df['Date']\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>% responding right</th>\n",
" <th>% responding wrong</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2018-08-13</th>\n",
" <td>2018-08-13</td>\n",
" <td>43</td>\n",
" <td>47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-08-14</th>\n",
" <td>2018-08-14</td>\n",
" <td>43</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-08-21</th>\n",
" <td>2018-08-21</td>\n",
" <td>41</td>\n",
" <td>47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-08-29</th>\n",
" <td>2018-08-29</td>\n",
" <td>42</td>\n",
" <td>47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-04-09</th>\n",
" <td>2018-04-09</td>\n",
" <td>42</td>\n",
" <td>48</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date % responding right % responding wrong\n",
"Date \n",
"2018-08-13 2018-08-13 43 47\n",
"2018-08-14 2018-08-14 43 45\n",
"2018-08-21 2018-08-21 41 47\n",
"2018-08-29 2018-08-29 42 47\n",
"2018-04-09 2018-04-09 42 48"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function uses StatsModels to put a smooth curve through a time series (and stuff the results back into a Pandas Series)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.nonparametric.smoothers_lowess import lowess\n",
"\n",
"def make_lowess(series):\n",
" endog = series.values\n",
" exog = series.index.values\n",
"\n",
" smooth = lowess(endog, exog)\n",
" index, data = np.transpose(smooth)\n",
" \n",
" return pd.Series(data, index=pd.to_datetime(index)) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's what the graph looks like."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"options = dict(marker='o', linewidth=0, alpha=0.3, label='')\n",
"\n",
"df['% responding right'].plot(color='C0', **options)\n",
"df['% responding wrong'].plot(color='C1', **options)\n",
"\n",
"right = make_lowess(df['% responding right'])\n",
"right.plot(label='Right')\n",
"\n",
"wrong = make_lowess(df['% responding wrong'])\n",
"wrong.plot(label='Wrong')\n",
"\n",
"plt.legend();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
@robinovitch61
Copy link

Nice! Loved her article!

@tomfbush
Copy link

Thanks for this! For my use case I needed a slightly more sensitive smoothed line so I made a tiny addition to the function to add a variable which passes to the frac parameter for the original lowess function, this means you can alter the 'wobbliness' each time you call the function as required.

@AllenDowney
Copy link
Author

AllenDowney commented Jul 26, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment