Skip to content

Instantly share code, notes, and snippets.

@ladyrassilon
Created January 29, 2016 01:47
Show Gist options
  • Save ladyrassilon/eb383e4b72a49b4f7a62 to your computer and use it in GitHub Desktop.
Save ladyrassilon/eb383e4b72a49b4f7a62 to your computer and use it in GitHub Desktop.
Age of US Presidents
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Ages of US Presidents\n",
"==============\n",
"\n",
"There have been some quite ageist commentary about the respective ages of the current presidential candidates, the most recent being a meme I saw on facebook.\n",
"![Showing before and after 8 years in presidency, showing Bernie Sanders now, and a disintegrating corpse afterwards](https://dl.dropboxusercontent.com/u/8190353/12552682_1169487213097516_1888312663532546682_n.jpg)\n",
"Democrats\n",
"--------\n",
"Clinton (68), Sanders (74), O'Malley (53), De La Fuente (61)\n",
"\n",
"GOP\n",
"---\n",
"Trump (69), Carson (64), Cruz(45), Fiorina (61), Paul (53), Bush (62), Christie (53), Huckabee (60), Gilmore (66), Rubio (44), Kasich (63), Santorum (57)\n",
"\n",
"Libertarian\n",
"----------\n",
"Gary Johnson (63)\n",
"\n",
"Green\n",
"-----\n",
"Jill Stein (65)\n",
"\n",
"So I was curious with how that tracked with the ages of presidents at their inaugurations, and retirements throughout history and I thought about doing some scraping to get the numbers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First step is to import the libraries."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import requests # My favourite library for accessing content\n",
"import numpy as np #For the number crunching\n",
"from decimal import Decimal\n",
"from dateutil import parser #For dateparsing\n",
"from bs4 import BeautifulSoup #For the html processing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then grab the content from wikipedia and put it into beautiful soup."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"response = requests.get(\"https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States_by_age\")\n",
"content = response.content\n",
"\n",
"soup = BeautifulSoup(content, \"html.parser\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now pull out the inauguration dates, centuries (more on these two in a sec) and the raw ages. This took a little tinkering with the rows, cells and spans to find the right values.\n",
"\n",
"This is far more lengthy than I usually do, but I wanted to make it clearer what I was doing for each piece of info."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"inauguration_dates = []\n",
"inauguration_centuries = []\n",
"\n",
"inauguration_ages = []\n",
"retirement_ages = []\n",
"\n",
"tables = soup.findAll(\"table\")\n",
"table = tables[0]\n",
"\n",
"for row in table.findAll(\"tr\")[1:]:\n",
" tds = row.findAll(\"td\")\n",
" if tds:\n",
" in_age_cell = tds[4]\n",
" in_age_spans = in_age_cell.findAll(\"span\")\n",
" in_age_text = in_age_spans[0].text\n",
" in_age = in_age_text.split(\"-\")\n",
" in_num = Decimal(in_age[0]) + (Decimal(in_age[1])/Decimal(364.25))\n",
" inauguration_ages.append(in_num)\n",
" \n",
" out_age_cell = tds[6]\n",
" out_age_spans = out_age_cell.findAll(\"span\")\n",
" out_age_text = out_age_spans[0].text\n",
" out_age = out_age_text.split(\"-\")\n",
" out_num = Decimal(out_age[0]) + (Decimal(out_age[1])/Decimal(364.25))\n",
" retirement_ages.append(out_num)\n",
" \n",
" date_cell = tds[3]\n",
" date_spans = date_cell.findAll(\"span\")\n",
" date_text = date_spans[1].text\n",
" \n",
" inauguration_date = parser.parse(date_text)\n",
" inauguration_dates.append(inauguration_date)\n",
" \n",
" year = date_text.split(\"-\")[0]\n",
" century = year[:2]\n",
" inauguration_centuries.append(century)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now we have lists containing the inauguration centuries, dates, and inauguration, and ages at retirement, all in order, so lets rework them into a more usable format."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"century_map = {key:[] for key in set(inauguration_centuries)}\n",
"for idx, century in enumerate(inauguration_centuries):\n",
" presidential_data = {\n",
" \"inauguration\": inauguration_dates[idx],\n",
" \"start_age\": inauguration_ages[idx],\n",
" \"end_age\": retirement_ages[idx],\n",
" }\n",
" century_map[century].append(presidential_data)\n",
"\n",
"#We also want the data for all time\n",
"all_presidential_data = []\n",
"for idx in range(len(inauguration_dates)):\n",
" presidential_data = {\n",
" \"inauguration\": inauguration_dates[idx],\n",
" \"start_age\": inauguration_ages[idx],\n",
" \"end_age\": retirement_ages[idx],\n",
" }\n",
" all_presidential_data.append(presidential_data)\n",
"\n",
"#I'll use this in a sec for output\n",
"item_order = sorted(century_map.keys())\n",
"item_order = [\"All Time\"] + item_order\n",
"\n",
"century_map[\"All Time\"] = all_presidential_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets do some number crunching."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"century_data = {}\n",
"\n",
"for century, data in century_map.items():\n",
" start_ages = [item[\"start_age\"] for item in data]\n",
" end_ages = [item[\"end_age\"] for item in data if item[\"end_age\"]] # Some presidents didn't finish their terms.\n",
" output_data = {\n",
" \"start_mean\": np.mean(start_ages),\n",
" \"start_median\": np.median(start_ages),\n",
" \"start_max\": np.max(start_ages),\n",
" \"start_min\": np.min(start_ages),\n",
" \"end_mean\": np.mean(end_ages),\n",
" \"end_median\": np.median(end_ages),\n",
" \"end_max\": np.max(end_ages),\n",
" \"end_min\": np.min(end_ages),\n",
" }\n",
" for k, v in output_data.items():\n",
" output_data[k] = round(v, 2)\n",
" century_data[century] = output_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets output the final data. Remember kwargs string formatting is your friend."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All Time\n",
"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:55.14\n",
"Median:54.9\n",
"Max:69.96\n",
"Min:42.88\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:60.93\n",
"Median:60.18\n",
"Max:77.96\n",
"Min:50.35 \n",
"\n",
"\n",
"18th Century\n",
"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:59.26\n",
"Median:59.26\n",
"Max:61.34\n",
"Min:57.18\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:65.19\n",
"Median:65.19\n",
"Max:65.34\n",
"Min:65.03 \n",
"\n",
"\n",
"19th Century\n",
"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:55.23\n",
"Median:54.41\n",
"Max:68.06\n",
"Min:46.85\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:59.58\n",
"Median:58.98\n",
"Max:69.97\n",
"Min:51.96 \n",
"\n",
"\n",
"20th Century\n",
"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:55.01\n",
"Median:55.24\n",
"Max:69.96\n",
"Min:42.88\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:61.93\n",
"Median:60.99\n",
"Max:77.96\n",
"Min:50.35 \n",
"\n",
"\n",
"21th Century\n",
"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:51.0\n",
"Median:51.0\n",
"Max:54.54\n",
"Min:47.46\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:62.54\n",
"Median:62.54\n",
"Max:62.54\n",
"Min:62.54 \n",
"\n",
"\n"
]
}
],
"source": [
"for idx, key in enumerate(item_order):\n",
" if not idx:\n",
" print(u\"{century}\".format(century=key))\n",
" else:\n",
" print(u\"{century}th Century\".format(century=int(key) + 1))\n",
" output_string = \"\"\"\n",
"Inauguration Dates\n",
"==================\n",
"Mean:{start_mean}\n",
"Median:{start_median}\n",
"Max:{start_max}\n",
"Min:{start_min}\n",
"\n",
"Retirement Dates\n",
"================\n",
"Mean:{end_mean}\n",
"Median:{end_median}\n",
"Max:{end_max}\n",
"Min:{end_min} \n",
"\n",
"\"\"\"\n",
" print(output_string.format(**century_data[key]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, actually everyone except Bernie Sanders is inside the previous presidents' age ranges and only a year outside the age ranges of the 19th century presidents.\n",
"\n",
"Now on the one hand medical care has increased considerably over the generations, but on the other hand the day to day stress has increased.\n",
"\n",
"However even in the case of Bernie Sanders, the hyperbole about age does seem unfounded.\n",
"\n",
"I could do a lot more extrapolations on this data, and drawing in other data sources, but I wanted to just take a simple problem, and use some basic scraping and data processing to get a rough answer to whether ageism was reasonable."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment