Skip to content

Instantly share code, notes, and snippets.

@austinlyons
Last active August 16, 2017 22:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save austinlyons/7450741 to your computer and use it in GitHub Desktop.
Save austinlyons/7450741 to your computer and use it in GitHub Desktop.
iPython notebook: A choropleth depicting percentage of public employees that are female in each county in Iowa.
{
"metadata": {
"name": "Iowa Choropleth Map"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "# Making an Iowa choropleth map using Python\n\n## Preface\nI'm following along with this [flowingdata tutorial](http://flowingdata.com/2009/11/12/how-to-make-a-us-county-thematic-map-using-free-tools/), with some slight tweaks. My choropleth will visualize the percentage public employees that are female in each county. I [previously calculated these percentages with pandas](http://www.seeaustinhack.com/2013/10/13/using-pandas-to-analyze-iowa-public-salaries/) using [data I scraped](http://www.seeaustinhack.com/2013/09/06/obtaining-data-with-python-scrapy/) from the Des Moines Register's Iowa public salary database. \n\n## Code\nIf you just want to see the code, I put it on GitHub: https://github.com/austinlyons/iowa_female_public_employees\n\n## SVG\nTo get an SVG with Iowa and it's counties, I took the [US county SVG from Wikimedia](http://upload.wikimedia.org/wikipedia/commons/5/5f/USA_Counties_with_FIPS_and_names.svg), opened the SVG in a text editor, and deleted everything but the Iowa counties. I edited the resulting map a bit with [Inkscape](http://inkscape.org/), making sure that each county path retained the county name and its FIPS code. To easily look up a county's FIPS code I created a .csv with Iowa counties and their respective FIPS code using [data from the EPA](http://www.epa.gov/envirofw/html/codes/ia.html)."
},
{
"cell_type": "code",
"collapsed": false,
"input": "# imports\nimport pandas as pd\nimport numpy as np\nfrom BeautifulSoup import BeautifulSoup",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": "# implement county->fips lookup table with a dataframe with the county as the index\nfips = pd.read_csv('iowa_county_fips.csv', index_col=0)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": "# are there 99 counties in this dataframe? sanity check\nprint fips",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "<class 'pandas.core.frame.DataFrame'>\nIndex: 99 entries, ADAIR to WRIGHT\nData columns (total 1 columns):\ncode 99 non-null values\ndtypes: int64(1)\n"
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "# does the data look like we expect?\nprint fips[0:10]",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " code\ncounty \nADAIR 19001\nADAMS 19003\nALLAMAKEE 19005\nAPPANOOSE 19007\nAUDUBON 19009\nBENTON 19011\nBLACK HAWK 19013\nBOONE 19015\nBREMER 19017\nBUCHANAN 19019\n"
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "# Iowa public employees dataframe: % female, % male, total # employees. County is set as the index\niowa_by_sex = pd.read_csv('iowa_public_employees_female_male_ratio.csv', index_col=0)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "# Let's see a summary of this data\nprint iowa_by_sex.describe()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " F M Employee Count\ncount 100.000000 100.000000 100.000000\nmean 0.514018 0.485982 576.960000\nstd 0.107958 0.107958 1882.730061\nmin 0.274510 0.000000 1.000000\n25% 0.457567 0.431299 62.750000\n50% 0.507613 0.492387 125.500000\n75% 0.568701 0.542433 345.250000\nmax 1.000000 0.725490 15229.000000\n"
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": "print iowa_by_sex.loc['TAMA']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "F 0.664311\nM 0.335689\nEmployee Count 283.000000\nName: TAMA, dtype: float64\n"
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": "# add code column to our iowa dataframe\niowa_by_sex['code'] = fips['code'].astype(int)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": "# pandas magic! We should see the code as a new column in this row.\n# adding a column is so simple since the two dataframes \n# each use county as their index\nprint iowa_by_sex.loc['TAMA']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "F 0.664311\nM 0.335689\nEmployee Count 283.000000\ncode 19171.000000\nName: TAMA, dtype: float64\n"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "# load the blank Iowa SVG\nsvg = open('iowa_counties.svg', 'r').read()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<img src=\"http://i.imgur.com/qdh2mJh.png?1\" width=\"50%\" height=\"50%\">"
},
{
"cell_type": "code",
"collapsed": false,
"input": "# parse SVG, defining selfClosingTags as shown here: https://josephhall.org/nqb2/index.php/flwdchrplth\nsoup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": "# find counties using findAll\npaths = soup.findAll('path')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": "# The map colors in order from lightest to darkest\ncolors = ['#F1EEF6', '#D0D1E6', '#A6BDDB', '#74A9CF', '#2B8CBE', '#045A8D']",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": "# county base style. We'll add the fill color (at the end of this string) for each county path\npath_style=\"font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;fill:\"",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": "# Color the counties based on percent of public employees that are female\nfor p in paths:\n \n if p['id'] not in [\"State_Lines\", \"separator\"]: # only color the counties\n try:\n rate = iowa_by_sex[iowa_by_sex['code'] == int(p['id'])]['F']\n except:\n continue\n \n if rate > 0.833:\n color_class = 5\n elif rate > 0.666:\n color_class = 4\n elif rate > 0.5:\n color_class = 3\n elif rate > 0.333:\n color_class = 2\n elif rate > 0.166:\n color_class = 1\n else:\n color_class = 0\n \n color = colors[color_class]\n p['style'] = path_style + color",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": "# Save result\nfo = open(\"iowa_counties_colored.svg\", \"wb\")\nfo.write(soup.prettify());\nfo.close()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# output is an SVG of Iowa counties colored by % of public employees that are female"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "<img src=\"http://i.imgur.com/s22Y080.png\" width=\"60%\" height=\"60%\">"
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment