Last active
August 16, 2017 22:57
-
-
Save austinlyons/7450741 to your computer and use it in GitHub Desktop.
iPython notebook: A choropleth depicting percentage of public employees that are female in each county in Iowa.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "Iowa Choropleth Map" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# Making an Iowa choropleth map using Python\n\n## Preface\nI'm following along with this [flowingdata tutorial](http://flowingdata.com/2009/11/12/how-to-make-a-us-county-thematic-map-using-free-tools/), with some slight tweaks. My choropleth will visualize the percentage public employees that are female in each county. I [previously calculated these percentages with pandas](http://www.seeaustinhack.com/2013/10/13/using-pandas-to-analyze-iowa-public-salaries/) using [data I scraped](http://www.seeaustinhack.com/2013/09/06/obtaining-data-with-python-scrapy/) from the Des Moines Register's Iowa public salary database. \n\n## Code\nIf you just want to see the code, I put it on GitHub: https://github.com/austinlyons/iowa_female_public_employees\n\n## SVG\nTo get an SVG with Iowa and it's counties, I took the [US county SVG from Wikimedia](http://upload.wikimedia.org/wikipedia/commons/5/5f/USA_Counties_with_FIPS_and_names.svg), opened the SVG in a text editor, and deleted everything but the Iowa counties. I edited the resulting map a bit with [Inkscape](http://inkscape.org/), making sure that each county path retained the county name and its FIPS code. To easily look up a county's FIPS code I created a .csv with Iowa counties and their respective FIPS code using [data from the EPA](http://www.epa.gov/envirofw/html/codes/ia.html)." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# imports\nimport pandas as pd\nimport numpy as np\nfrom BeautifulSoup import BeautifulSoup", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# implement county->fips lookup table with a dataframe with the county as the index\nfips = pd.read_csv('iowa_county_fips.csv', index_col=0)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# are there 99 counties in this dataframe? sanity check\nprint fips", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "<class 'pandas.core.frame.DataFrame'>\nIndex: 99 entries, ADAIR to WRIGHT\nData columns (total 1 columns):\ncode 99 non-null values\ndtypes: int64(1)\n" | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# does the data look like we expect?\nprint fips[0:10]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": " code\ncounty \nADAIR 19001\nADAMS 19003\nALLAMAKEE 19005\nAPPANOOSE 19007\nAUDUBON 19009\nBENTON 19011\nBLACK HAWK 19013\nBOONE 19015\nBREMER 19017\nBUCHANAN 19019\n" | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# Iowa public employees dataframe: % female, % male, total # employees. County is set as the index\niowa_by_sex = pd.read_csv('iowa_public_employees_female_male_ratio.csv', index_col=0)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# Let's see a summary of this data\nprint iowa_by_sex.describe()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": " F M Employee Count\ncount 100.000000 100.000000 100.000000\nmean 0.514018 0.485982 576.960000\nstd 0.107958 0.107958 1882.730061\nmin 0.274510 0.000000 1.000000\n25% 0.457567 0.431299 62.750000\n50% 0.507613 0.492387 125.500000\n75% 0.568701 0.542433 345.250000\nmax 1.000000 0.725490 15229.000000\n" | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "print iowa_by_sex.loc['TAMA']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "F 0.664311\nM 0.335689\nEmployee Count 283.000000\nName: TAMA, dtype: float64\n" | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# add code column to our iowa dataframe\niowa_by_sex['code'] = fips['code'].astype(int)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# pandas magic! We should see the code as a new column in this row.\n# adding a column is so simple since the two dataframes \n# each use county as their index\nprint iowa_by_sex.loc['TAMA']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "F 0.664311\nM 0.335689\nEmployee Count 283.000000\ncode 19171.000000\nName: TAMA, dtype: float64\n" | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# load the blank Iowa SVG\nsvg = open('iowa_counties.svg', 'r').read()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "<img src=\"http://i.imgur.com/qdh2mJh.png?1\" width=\"50%\" height=\"50%\">" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# parse SVG, defining selfClosingTags as shown here: https://josephhall.org/nqb2/index.php/flwdchrplth\nsoup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# find counties using findAll\npaths = soup.findAll('path')", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# The map colors in order from lightest to darkest\ncolors = ['#F1EEF6', '#D0D1E6', '#A6BDDB', '#74A9CF', '#2B8CBE', '#045A8D']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# county base style. We'll add the fill color (at the end of this string) for each county path\npath_style=\"font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;fill:\"", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# Color the counties based on percent of public employees that are female\nfor p in paths:\n \n if p['id'] not in [\"State_Lines\", \"separator\"]: # only color the counties\n try:\n rate = iowa_by_sex[iowa_by_sex['code'] == int(p['id'])]['F']\n except:\n continue\n \n if rate > 0.833:\n color_class = 5\n elif rate > 0.666:\n color_class = 4\n elif rate > 0.5:\n color_class = 3\n elif rate > 0.333:\n color_class = 2\n elif rate > 0.166:\n color_class = 1\n else:\n color_class = 0\n \n color = colors[color_class]\n p['style'] = path_style + color", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# Save result\nfo = open(\"iowa_counties_colored.svg\", \"wb\")\nfo.write(soup.prettify());\nfo.close()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "# output is an SVG of Iowa counties colored by % of public employees that are female" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "<img src=\"http://i.imgur.com/s22Y080.png\" width=\"60%\" height=\"60%\">" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment