Skip to content

Instantly share code, notes, and snippets.

@AllenDowney
Created December 28, 2018 17:35
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AllenDowney/476b9c77a6581d689528c99d688685e9 to your computer and use it in GitHub Desktop.
Save AllenDowney/476b9c77a6581d689528c99d688685e9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Think Bayes\n",
"\n",
"This notebook presents example code and exercise solutions for Think Bayes.\n",
"\n",
"Copyright 2018 Allen B. Downey\n",
"\n",
"MIT License: https://opensource.org/licenses/MIT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Race, religion and politics\n",
"\n",
"In their November 3, 2018 issue, *The Economist* published the [following figure](https://www.economist.com/graphic-detail/2018/11/03/how-to-forecast-an-americans-vote) showing results from their analysis of data from [YouGov](https://today.yougov.com/).\n",
"\n",
"![title](./figs/economist.png)\n",
"\n",
"These results are probably based on logistic regression, or something like it. As an exercise in conditional probability, I will try to replicate their results using data from the General Social Survey (GSS). Rather than use a regression model, I will just count the number of respondents in each group.\n",
"\n",
"This is just an exercise; my results should not be taken too seriously. \n",
"\n",
"- First, I am using GSS data from the entire history of the survey, going back to 1972.\n",
"\n",
"- Second, many of the conditions I use are only rough matches for the condition *The Economist* uses.\n",
"\n",
"- Also, the way I am using the GSS does not make it a representative survey.\n",
"\n",
"- Finally, some of the conditional probabilities I compute are based on small sample sizes.\n",
"\n",
"The point of the exercise is to practice thinking about and computing conditional probabilities.\n",
"\n",
"Here are the libraries I'll use."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Configure Jupyter so figures appear in the notebook\n",
"%matplotlib inline\n",
"\n",
"# Configure Jupyter to display the assigned value after an assignment\n",
"%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are functions to compute probabilities, counts, and conditional probabilities."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def prob(A):\n",
" \"\"\"Probability of A\"\"\"\n",
" return A.mean()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def count(A):\n",
" \"\"\"Number of instances of A\"\"\"\n",
" return A.sum()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def conditional(A, B):\n",
" \"\"\"Conditional probability of A given B\"\"\"\n",
" return prob(A & B) / prob(B)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### GSS data\n",
"\n",
"The GSS data I'm using is from [this extract](https://gssdataexplorer.norc.org/projects/54786)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>relig</th>\n",
" <th>srcbelt</th>\n",
" <th>region</th>\n",
" <th>adults</th>\n",
" <th>wtssall</th>\n",
" <th>ballot</th>\n",
" <th>cohort</th>\n",
" <th>feminist</th>\n",
" <th>polviews</th>\n",
" <th>partyid</th>\n",
" <th>race</th>\n",
" <th>sex</th>\n",
" <th>educ</th>\n",
" <th>age</th>\n",
" <th>indus10</th>\n",
" <th>occ10</th>\n",
" <th>id_</th>\n",
" <th>realinc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1972</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>0.4446</td>\n",
" <td>0</td>\n",
" <td>1949</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>16</td>\n",
" <td>23</td>\n",
" <td>5170</td>\n",
" <td>520</td>\n",
" <td>1</td>\n",
" <td>18951.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1972</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0.8893</td>\n",
" <td>0</td>\n",
" <td>1902</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>10</td>\n",
" <td>70</td>\n",
" <td>6470</td>\n",
" <td>7700</td>\n",
" <td>2</td>\n",
" <td>24366.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1972</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0.8893</td>\n",
" <td>0</td>\n",
" <td>1924</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>48</td>\n",
" <td>7070</td>\n",
" <td>4920</td>\n",
" <td>3</td>\n",
" <td>24366.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1972</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0.8893</td>\n",
" <td>0</td>\n",
" <td>1945</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>17</td>\n",
" <td>27</td>\n",
" <td>5170</td>\n",
" <td>800</td>\n",
" <td>4</td>\n",
" <td>30458.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1972</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0.8893</td>\n",
" <td>0</td>\n",
" <td>1911</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>61</td>\n",
" <td>6680</td>\n",
" <td>5020</td>\n",
" <td>5</td>\n",
" <td>50763.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year relig srcbelt region adults wtssall ballot cohort feminist \\\n",
"0 1972 3 3 3 1 0.4446 0 1949 0 \n",
"1 1972 2 3 3 2 0.8893 0 1902 0 \n",
"2 1972 1 3 3 2 0.8893 0 1924 0 \n",
"3 1972 5 3 3 2 0.8893 0 1945 0 \n",
"4 1972 1 3 3 2 0.8893 0 1911 0 \n",
"\n",
" polviews partyid race sex educ age indus10 occ10 id_ realinc \n",
"0 0 2 1 2 16 23 5170 520 1 18951.0 \n",
"1 0 1 1 1 10 70 6470 7700 2 24366.0 \n",
"2 0 3 1 2 12 48 7070 4920 3 24366.0 \n",
"3 0 1 1 2 17 27 5170 800 4 30458.0 \n",
"4 0 0 1 2 12 61 6680 5020 5 50763.0 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from utils import read_gss\n",
"\n",
"gss = read_gss('data/gss_bayes')\n",
"gss.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def replace_invalid(series, bad_vals, replacement=np.nan):\n",
" series.replace(bad_vals, replacement, inplace=True)\n",
" \n",
"replace_invalid(gss.partyid, [3, 7, 8, 9])\n",
"replace_invalid(gss.relig, [98, 99])\n",
"replace_invalid(gss.educ, [98, 99])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def values(series):\n",
" return series.value_counts().sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### relig\n",
"\n",
"https://gssdataexplorer.norc.org/projects/54786/variables/287/vshow"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0 35968\n",
"2.0 15181\n",
"3.0 1246\n",
"4.0 7254\n",
"5.0 1069\n",
"6.0 177\n",
"7.0 89\n",
"8.0 38\n",
"9.0 136\n",
"10.0 112\n",
"11.0 762\n",
"12.0 30\n",
"13.0 135\n",
"Name: relig, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.relig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### srcbelt\n",
"\n",
"https://gssdataexplorer.norc.org/projects/54786/variables/121/vshow"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 5572\n",
"2 8670\n",
"3 7113\n",
"4 9348\n",
"5 23583\n",
"6 8180\n",
"Name: srcbelt, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.srcbelt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### region\n",
"\n",
"https://gssdataexplorer.norc.org/projects/54786/variables/119/vshow"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 2976\n",
"2 9057\n",
"3 11502\n",
"4 4559\n",
"5 12039\n",
"6 4121\n",
"7 5923\n",
"8 3882\n",
"9 8407\n",
"Name: region, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.region)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### partyid\n",
"\n",
"https://gssdataexplorer.norc.org/projects/52787/variables/141/vshow"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0 9999\n",
"1.0 12942\n",
"2.0 7485\n",
"4.0 5462\n",
"5.0 9661\n",
"6.0 6063\n",
"Name: partyid, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.partyid)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### race\n",
"\n",
"https://gssdataexplorer.norc.org/projects/52787/variables/82/vshow"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 50340\n",
"2 8802\n",
"3 3324\n",
"Name: race, dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.race)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### sex\n",
"\n",
"https://gssdataexplorer.norc.org/projects/52787/variables/81/vshow"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 27562\n",
"2 34904\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values(gss.sex)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Missing data\n",
"\n",
"To keep things simple, I'm dropping rows that are missing any of the data I need."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(51397, 19)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"subset = gss.dropna(subset=['sex', 'race', 'partyid', 'region', 'relig', 'educ', 'realinc', 'age'])\n",
"subset.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boolean variables\n",
"\n",
"The following line makes the columns from `subset` available as global variables. Kids, don't try this at home!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"globals().update(subset)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I need boolean Series for each of the conditions *The Economist* uses."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8095414129229332"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"white = race==1\n",
"prob(white)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.14559215518415472"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"black = race==2\n",
"prob(black)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rather than \"born again, church-going Protestant\", I am just using \"Protestant\"."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5979921007062668"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prot = relig==1\n",
"prob(prot)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 28788\n",
"True 22609\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"male = sex==1\n",
"values(male)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 22609\n",
"True 28788\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"female = sex==2\n",
"values(female)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rather than \"age 25\", and I am using \"young\", defined as less than 30."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.1974045177734109"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"young = age<30\n",
"prob(young)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.13115551491332178"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rural = srcbelt==6\n",
"prob(rural)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6042376014164251"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urban = srcbelt.isin([1,2,5])\n",
"prob(urban)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For \"never attended college\" I am using 12 or fewer years of education. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5199330700235423"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"no_college = educ<=12\n",
"prob(no_college)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For \"post-graduate degree\" I am using more than 17 years of education."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.07578263322762029"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"postgrad = educ>17\n",
"prob(postgrad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*The Economist* considers two specific incomes, $100,000 and $15,000. Instead, I define conditions for high and low income:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.07444014242076386"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"high_income = realinc>=80000\n",
"prob(high_income)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.3670836819269607"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"low_income = realinc<=15000\n",
"prob(low_income)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.19641224195964746"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"west = region.isin([8,9])\n",
"prob(west)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.2568048718796817"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"midwest = region.isin([3,4])\n",
"prob(midwest)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The GSS has 7 levels of party identification, with \"Independent\" in the middle. I exclude \"Independent\" and \"Other party\" and classify everyone else as Democrat or Republican. "
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5891005311594062"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"democrat = partyid.isin([0,1,2])\n",
"prob(democrat)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.4108994688405938"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"republican = partyid.isin([4,5,6])\n",
"prob(republican)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The blue path\n",
"\n",
"Here are the conditional probabilities that make up the blue path in the figure."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"blue_path = pd.Series([]);"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Average'] = prob(democrat)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Black'] = conditional(democrat, black)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Protestant'] = conditional(democrat, black&prot)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Male'] = conditional(democrat, black&prot&male)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Young'] = conditional(democrat, black&prot&male&young)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"blue_path['Rural'] = conditional(democrat, black&prot&male&young&rural)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"blue_path['No college'] = conditional(democrat, black&prot&male&young&rural&no_college)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'm not able to compute the last two probabilities because there is no one in the dataset that matches the conditions."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"count(black&prot&male&young&rural&no_college&high_income)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nevertheless, here are the results for the blue path."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Average 0.589101\n",
"Black 0.901376\n",
"Protestant 0.907050\n",
"Male 0.894713\n",
"Young 0.839367\n",
"Rural 0.821429\n",
"No college 0.894737\n",
"dtype: float64"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blue_path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's some code to plot the results."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def arrow(series, index, **options):\n",
" \"\"\"Draw an arrow showing the effect of a condition.\n",
" \n",
" series: Series that maps from label to probability\n",
" index: which step in the Series to plot\n",
" options: passed to plt.plot\n",
" \"\"\"\n",
" label = series.index[index]\n",
" y1 = index\n",
" y2 = index+1\n",
" x1 = series.iloc[index]\n",
" try:\n",
" x2 = series.iloc[index+1]\n",
" plt.plot([x1, x1, x2], [y1, y2, y2], **options)\n",
" plt.text(x1, y1, label)\n",
" except IndexError:\n",
" plt.text(x1, y1, label)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"for i in range(len(blue_path)):\n",
" arrow(blue_path, i, color='blue')\n",
" \n",
"plt.gca().invert_xaxis()\n",
"plt.gca().invert_yaxis()\n",
"plt.xlabel('Probability of Democrat');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most of these results are qualitatively consistent with *The Economist*, with a few exceptions:\n",
"\n",
"* The effect of \"Protestant\" is smaller and in the wrong direction, probably because my condition is not limited to \"born-again, church-going\" Protestants.\n",
"\n",
"* The effect of \"No college\" is in the wrong direction, but at this point in the analysis, we are down to a small number of respondents, so this result should not be taken seriously."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Red path\n",
"\n",
"Here's the same analysis for the red path."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"red_path = pd.Series([]);"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"red_path['Average'] = prob(republican)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"red_path['White'] = conditional(republican, white)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"red_path['Protestant'] = conditional(republican, white&prot)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"red_path['Female'] = conditional(republican, white&prot&female)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"red_path['Young'] = conditional(republican, white&prot&female&young)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"red_path['Urban'] = conditional(republican, white&prot&female&young&urban)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"red_path['Postgrad'] = conditional(republican, white&prot&female&young&urban&postgrad)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"red_path['Low income'] = conditional(republican, white&prot&female&young&urban&postgrad&low_income)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"red_path['Midwest'] = conditional(republican, white&prot&female&young&urban&postgrad&low_income&midwest)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"count(white&prot&female&young&urban&postgrad&low_income&midwest)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Average 0.410899\n",
"White 0.473899\n",
"Protestant 0.552054\n",
"Female 0.531102\n",
"Young 0.551869\n",
"Urban 0.537722\n",
"Postgrad 0.545455\n",
"Low income 0.600000\n",
"Midwest 0.500000\n",
"dtype: float64"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"red_path"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"for i in range(len(red_path)):\n",
" arrow(red_path, i, color='red')\n",
" \n",
"plt.gca().invert_yaxis()\n",
"plt.xlabel('Probability of Republican');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, some of the results are consistent with *The Economist*, but there are several exceptions:\n",
"\n",
"* The effect of \"Young\" is in the wrong direction, but my definition of young is different from theirs.\n",
"\n",
"* The effect of \"Postgrad\" is in the wrong direction, but it is based on a small sample size.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment