Skip to content

Instantly share code, notes, and snippets.

@saulshanabrook
Last active January 18, 2017 15:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save saulshanabrook/836076a2b9a0eaf93598e05bbee1197e to your computer and use it in GitHub Desktop.
Save saulshanabrook/836076a2b9a0eaf93598e05bbee1197e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:07:24.632250",
"start_time": "2016-12-11T00:07:24.605803"
},
"collapsed": false
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:07:25.101393",
"start_time": "2016-12-11T00:07:24.794314"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [2] used 0.230 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 42.523 MiB\n"
]
}
],
"source": [
"from ipython_memwatcher import MemWatcher\n",
"mw = MemWatcher()\n",
"mw.start_watching_memory()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ideal Point Estimation Testing\n",
"\n",
"We want to test our gradient based method of ideal point estimation, against the PyMC3 model."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:07:28.938723",
"start_time": "2016-12-11T00:07:27.071781"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [3] used 79.211 MiB RAM in 1.753s, peaked 0.000 MiB above current, total RAM usage 121.734 MiB\n"
]
}
],
"source": [
"import ideal_point.ideal_point as ip"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Test Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T03:51:58.380436",
"start_time": "2016-12-10T03:51:58.240842"
},
"collapsed": false
},
"outputs": [],
"source": [
"df = ip.test_data(20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:21:44.873097",
"start_time": "2016-12-10T02:21:44.720807"
},
"collapsed": false
},
"outputs": [],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:25:32.871436",
"start_time": "2016-12-11T00:25:32.381168"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [18] used 9.652 MiB RAM in 0.356s, peaked 0.000 MiB above current, total RAM usage 4340.082 MiB\n"
]
}
],
"source": [
"from altair import *\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:25:31.130435",
"start_time": "2016-12-11T00:25:30.974587"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [17] used 1.191 MiB RAM in 0.016s, peaked 0.000 MiB above current, total RAM usage 4330.430 MiB\n"
]
}
],
"source": [
"def plot_params(params):\n",
" rows = []\n",
" for key, value in params.items():\n",
" for (i, v) in enumerate(value):\n",
" rows.append({\n",
" 'i': i,\n",
" 'param': key,\n",
" 'value': v\n",
" })\n",
" df = pd.DataFrame(rows)\n",
" return Chart(df).mark_circle().encode(\n",
" column='param:N',\n",
" x='i:O',\n",
" y='value:Q'\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PyMC3 ADVI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:15:21.771406",
"start_time": "2016-12-10T02:15:17.032349"
},
"collapsed": false
},
"outputs": [],
"source": [
"pymc_params = ip.advi_params(ip.create_model(df)).means"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:15:22.785465",
"start_time": "2016-12-10T02:15:22.623627"
},
"collapsed": false
},
"outputs": [],
"source": [
"plot_params(pymc_params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `vote_ideology` should match with the `legislator_ideology`. The first vote should be close ot the first legislator, and the last vote should be close to the last legislator (in ideology)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PyMC3 Sampling"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:23:30.053375",
"start_time": "2016-12-10T02:22:17.667228"
},
"collapsed": false
},
"outputs": [],
"source": [
"from pymc3 import find_MAP\n",
"from pymc3 import NUTS, sample\n",
"\n",
"model=ip.create_model(df)\n",
"with model:\n",
" trace = sample(2000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:23:30.216229",
"start_time": "2016-12-10T02:23:30.056288"
},
"collapsed": false
},
"outputs": [],
"source": [
"from pymc3 import forestplot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-10T02:23:31.060811",
"start_time": "2016-12-10T02:23:30.219276"
},
"collapsed": false
},
"outputs": [],
"source": [
"forestplot(trace)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Same problem here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manual Gradient"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:14:14.428591",
"start_time": "2016-12-11T00:14:14.113074"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [13] used 0.238 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 5992.418 MiB\n"
]
}
],
"source": [
"from ideal_point.gradient import Gradient \n",
"from scipy.special import expit as logistic"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:14:16.315575",
"start_time": "2016-12-11T00:14:16.172393"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The line_profiler extension is already loaded. To reload it, use:\n",
" %reload_ext line_profiler\n",
"In [14] used 0.000 MiB RAM in 0.003s, peaked 0.000 MiB above current, total RAM usage 5992.418 MiB\n"
]
}
],
"source": [
"%load_ext line_profiler\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:25:47.316392",
"start_time": "2016-12-11T00:25:47.009489"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [20] used -135.172 MiB RAM in 0.170s, peaked 135.406 MiB above current, total RAM usage 4213.129 MiB\n"
]
}
],
"source": [
"g = Gradient(ip.test_data(30))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:25:58.036278",
"start_time": "2016-12-11T00:25:57.826675"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-720.30101696\n",
"-720.30101696\n",
"-478.285431624\n",
"-441.825295499\n",
"-420.980500451\n",
"-385.627159114\n",
"-324.802623769\n",
"-252.811290555\n",
"-200.873256169\n",
"-170.464783023\n",
"\n",
"In [21] used 0.934 MiB RAM in 0.075s, peaked 0.000 MiB above current, total RAM usage 4214.062 MiB\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Widget Javascript not detected. It may not be installed properly. Did you enable the widgetsnbextension? If not, then run \"jupyter nbextension enable --py --sys-prefix widgetsnbextension\"\n"
]
}
],
"source": [
"g.run()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2016-12-11T00:26:01.072789",
"start_time": "2016-12-11T00:26:00.916794"
},
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div class=\"vega-embed\" id=\"c113c9cd-ea77-4159-9de3-17dfd10cf854\"></div>\n",
"\n",
"<style>\n",
".vega-embed svg, .vega-embed canvas {\n",
" border: 1px dotted gray;\n",
"}\n",
"\n",
".vega-embed .vega-actions a {\n",
" margin-right: 6px;\n",
"}\n",
"</style>\n"
]
},
"metadata": {
"jupyter-vega": "#c113c9cd-ea77-4159-9de3-17dfd10cf854"
},
"output_type": "display_data"
},
{
"data": {
"application/javascript": [
"var spec = {\"mark\": \"circle\", \"encoding\": {\"x\": {\"field\": \"i\", \"type\": \"ordinal\"}, \"column\": {\"field\": \"param\", \"type\": \"nominal\"}, \"y\": {\"field\": \"value\", \"type\": \"quantitative\"}}, \"config\": {\"cell\": {\"width\": 500, \"height\": 350}}, \"data\": {\"values\": [{\"i\": 0, \"param\": \"vote_ideologies\", \"value\": 0.7238689188660034}, {\"i\": 1, \"param\": \"vote_ideologies\", \"value\": 0.9864073981522212}, {\"i\": 2, \"param\": \"vote_ideologies\", \"value\": 1.0468387844806417}, {\"i\": 3, \"param\": \"vote_ideologies\", \"value\": 1.2303563344307904}, {\"i\": 4, \"param\": \"vote_ideologies\", \"value\": 1.4095699020236472}, {\"i\": 5, \"param\": \"vote_ideologies\", \"value\": 1.5026541924075538}, {\"i\": 6, \"param\": \"vote_ideologies\", \"value\": 1.683459449630872}, {\"i\": 7, \"param\": \"vote_ideologies\", \"value\": 1.8074855216893626}, {\"i\": 8, \"param\": \"vote_ideologies\", \"value\": 2.163009422745624}, {\"i\": 9, \"param\": \"vote_ideologies\", \"value\": 2.3878209245231545}, {\"i\": 10, \"param\": \"vote_ideologies\", \"value\": 2.598320104840746}, {\"i\": 11, \"param\": \"vote_ideologies\", \"value\": 2.688778376408092}, {\"i\": 12, \"param\": \"vote_ideologies\", \"value\": 2.8661867112300907}, {\"i\": 13, \"param\": \"vote_ideologies\", \"value\": 3.201664318645701}, {\"i\": 14, \"param\": \"vote_ideologies\", \"value\": 2.7214855735308303}, {\"i\": 15, \"param\": \"vote_ideologies\", \"value\": 2.4867021671789207}, {\"i\": 16, \"param\": \"vote_ideologies\", \"value\": 2.472875250210437}, {\"i\": 17, \"param\": \"vote_ideologies\", \"value\": 2.1421931833532732}, {\"i\": 18, \"param\": \"vote_ideologies\", \"value\": 1.9487982145636569}, {\"i\": 19, \"param\": \"vote_ideologies\", \"value\": 1.7244384678806526}, {\"i\": 20, \"param\": \"vote_ideologies\", \"value\": 1.5343845533619274}, {\"i\": 21, \"param\": \"vote_ideologies\", \"value\": 1.3494125069761913}, {\"i\": 22, \"param\": \"vote_ideologies\", \"value\": 1.265922669690926}, {\"i\": 23, \"param\": \"vote_ideologies\", \"value\": 1.1029453747165046}, {\"i\": 24, \"param\": \"vote_ideologies\", \"value\": 1.0137779792881274}, {\"i\": 25, \"param\": \"vote_ideologies\", \"value\": 0.9027392141987254}, {\"i\": 26, \"param\": \"vote_ideologies\", \"value\": 0.7808764912438394}, {\"i\": 27, \"param\": \"vote_ideologies\", \"value\": 0.7168259100751055}, {\"i\": 28, \"param\": \"vote_ideologies\", \"value\": 0.45555181931154215}, {\"i\": 0, \"param\": \"legislator_ideologies\", \"value\": 3.0274116625749024}, {\"i\": 1, \"param\": \"legislator_ideologies\", \"value\": 2.766279759874206}, {\"i\": 2, \"param\": \"legislator_ideologies\", \"value\": 2.4653438814065205}, {\"i\": 3, \"param\": \"legislator_ideologies\", \"value\": 2.251108602962943}, {\"i\": 4, \"param\": \"legislator_ideologies\", \"value\": 1.9828997167202362}, {\"i\": 5, \"param\": \"legislator_ideologies\", \"value\": 1.6871869660807393}, {\"i\": 6, \"param\": \"legislator_ideologies\", \"value\": 1.442601280048146}, {\"i\": 7, \"param\": \"legislator_ideologies\", \"value\": 1.1961273914586594}, {\"i\": 8, \"param\": \"legislator_ideologies\", \"value\": 0.9840046698774239}, {\"i\": 9, \"param\": \"legislator_ideologies\", \"value\": 0.7702887325434579}, {\"i\": 10, \"param\": \"legislator_ideologies\", \"value\": 0.5797440529499934}, {\"i\": 11, \"param\": \"legislator_ideologies\", \"value\": 0.400551163901209}, {\"i\": 12, \"param\": \"legislator_ideologies\", \"value\": 0.23631347427358643}, {\"i\": 13, \"param\": \"legislator_ideologies\", \"value\": 0.07345155788236508}, {\"i\": 14, \"param\": \"legislator_ideologies\", \"value\": -0.10344339838385486}, {\"i\": 15, \"param\": \"legislator_ideologies\", \"value\": -0.26161359472066487}, {\"i\": 16, \"param\": \"legislator_ideologies\", \"value\": -0.42187707030420724}, {\"i\": 17, \"param\": \"legislator_ideologies\", \"value\": -0.5944109221973217}, {\"i\": 18, \"param\": \"legislator_ideologies\", \"value\": -0.7792006363012691}, {\"i\": 19, \"param\": \"legislator_ideologies\", \"value\": -1.0293319543607375}, {\"i\": 20, \"param\": \"legislator_ideologies\", \"value\": -1.296296864587514}, {\"i\": 21, \"param\": \"legislator_ideologies\", \"value\": -1.5690512975572333}, {\"i\": 22, \"param\": \"legislator_ideologies\", \"value\": -1.7010125361888917}, {\"i\": 23, \"param\": \"legislator_ideologies\", \"value\": -1.9430062038337634}, {\"i\": 24, \"param\": \"legislator_ideologies\", \"value\": -2.0558486367399333}, {\"i\": 25, \"param\": \"legislator_ideologies\", \"value\": -2.2292882589024132}, {\"i\": 26, \"param\": \"legislator_ideologies\", \"value\": -2.371797793226603}, {\"i\": 27, \"param\": \"legislator_ideologies\", \"value\": -2.482480281500808}, {\"i\": 28, \"param\": \"legislator_ideologies\", \"value\": -2.650137335418494}, {\"i\": 29, \"param\": \"legislator_ideologies\", \"value\": -2.760793952174754}, {\"i\": 0, \"param\": \"vote_biases\", \"value\": -3.1915079237657333}, {\"i\": 1, \"param\": \"vote_biases\", \"value\": -2.976076188960744}, {\"i\": 2, \"param\": \"vote_biases\", \"value\": -2.53637144038333}, {\"i\": 3, \"param\": \"vote_biases\", \"value\": -2.4091718258688477}, {\"i\": 4, \"param\": \"vote_biases\", \"value\": -2.2797188183618124}, {\"i\": 5, \"param\": \"vote_biases\", \"value\": -2.001203450965056}, {\"i\": 6, \"param\": \"vote_biases\", \"value\": -1.8477361302448365}, {\"i\": 7, \"param\": \"vote_biases\", \"value\": -1.597329482666332}, {\"i\": 8, \"param\": \"vote_biases\", \"value\": -1.5238047752970125}, {\"i\": 9, \"param\": \"vote_biases\", \"value\": -1.266992718187132}, {\"i\": 10, \"param\": \"vote_biases\", \"value\": -0.9690709511212404}, {\"i\": 11, \"param\": \"vote_biases\", \"value\": -0.621657182845019}, {\"i\": 12, \"param\": \"vote_biases\", \"value\": -0.26789680391863296}, {\"i\": 13, \"param\": \"vote_biases\", \"value\": 0.15043533131246672}, {\"i\": 14, \"param\": \"vote_biases\", \"value\": 0.44562032137378466}, {\"i\": 15, \"param\": \"vote_biases\", \"value\": 0.7214035508857959}, {\"i\": 16, \"param\": \"vote_biases\", \"value\": 1.105209089166033}, {\"i\": 17, \"param\": \"vote_biases\", \"value\": 1.2229450558809474}, {\"i\": 18, \"param\": \"vote_biases\", \"value\": 1.3992488585394502}, {\"i\": 19, \"param\": \"vote_biases\", \"value\": 1.4900281541134788}, {\"i\": 20, \"param\": \"vote_biases\", \"value\": 1.5784463829540871}, {\"i\": 21, \"param\": \"vote_biases\", \"value\": 1.6373244528243356}, {\"i\": 22, \"param\": \"vote_biases\", \"value\": 1.8330916441135159}, {\"i\": 23, \"param\": \"vote_biases\", \"value\": 1.8728115968186934}, {\"i\": 24, \"param\": \"vote_biases\", \"value\": 2.0530721706544233}, {\"i\": 25, \"param\": \"vote_biases\", \"value\": 2.206767630626554}, {\"i\": 26, \"param\": \"vote_biases\", \"value\": 2.3757598160965525}, {\"i\": 27, \"param\": \"vote_biases\", \"value\": 2.7741518265435325}, {\"i\": 28, \"param\": \"vote_biases\", \"value\": 3.0489465179870168}]}};\n",
"var selector = \"#c113c9cd-ea77-4159-9de3-17dfd10cf854\";\n",
"var type = \"vega-lite\";\n",
"\n",
"var output_area = this;\n",
"require(['nbextensions/jupyter-vega/index'], function(vega) {\n",
" vega.render(selector, spec, type, output_area);\n",
"}, function (err) {\n",
" if (err.requireType !== 'scripterror') {\n",
" throw(err);\n",
" }\n",
"});\n"
]
},
"metadata": {
"jupyter-vega": "#c113c9cd-ea77-4159-9de3-17dfd10cf854"
},
"output_type": "display_data"
},
{
"data": {
"image/png": ""
},
"metadata": {
"jupyter-vega": "#c113c9cd-ea77-4159-9de3-17dfd10cf854"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"In [22] used 0.000 MiB RAM in 0.019s, peaked 0.000 MiB above current, total RAM usage 4214.062 MiB\n"
]
}
],
"source": [
"plot_params(g.params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It will keep improving the log likelihood, but it will keep this basic pattern... Which is wrong! Like I said above for PyMC3 ADVI. "
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"notify_time": "10",
"toc": {
"nav_menu": {
"height": "146px",
"width": "271px"
},
"navigate_menu": true,
"number_sections": false,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
},
"widgets": {
"state": {
"d236125aff9d4976a5ba7f32f964924f": {
"views": [
{
"cell_index": 23
}
]
}
},
"version": "1.2.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment