Skip to content

Instantly share code, notes, and snippets.

@BenGeissel
Created December 17, 2019 19:48
Show Gist options
  • Save BenGeissel/6d44cea0036a54e26e1de09a0064c512 to your computer and use it in GitHub Desktop.
Save BenGeissel/6d44cea0036a54e26e1de09a0064c512 to your computer and use it in GitHub Desktop.

How to Plot a Map in Python

Using Geopandas and Geoplot

Intro

At my previous job, I had to build maps quite often. There was never a particularly easy way to do this, so I decided to put my Python skills to the test to create a map. I ran into quite a few speed bumps along the way, but was eventually able to produce the map I intended to make. I believe with more practice, mapping in Python will become very easy. I originally stumbled across Geopandas and Geoplot for mapping, which I use here, however there are other Python libraries out there that produce nicer maps, such as Folium.

Decide What to Map

First, you have to decide what you would like to map and at what geographical level this information is at. I am interested in applying data science to environmental issues and sustainability, so I decided to take a look at some National Oceanic and Atmospheric Administration (NOAA) county level data for the United States. I specifically chose to look at maximum temperature by month for each county.

Second, you need to gather your data. From the NOAA climate division data website, I was able to pull the data I needed by clicking on the "nClimDiv" dataset link. After unzipping this data into a local folder I was ready to move on for now.

Third, you need to gather a proper Shapefile to plot your data. If you don't know what a Shapefile is, this link will help to explain their purpose. I was able to retrieve a United States county level Shapefile from the US Census TIGER/Line Shapefile Database. Download the proper dataset and store in the same local folder as the data you want to plot.

Map Prepwork

Shapefile

As mentioned above, I used the python libraries Geopandas and Geoplot. I additionally found that I needed the Descartes libraries installed as well. To install these libraries I had to run the following bash commands from my terminal: https://gist.github.com/0f32894b05548f9dc1627cfd293640ab

Now you will be able to import these libraries as you would with any other python library (e.g. "import pandas as pd"). To load in the Shapefile you can use the following Geopandas (gpd) method:

https://gist.github.com/916ee4d88192424fe27564ba294458f8

Data file

To load in the county level data, I had a few more problems to solve. The file came from NOAA in a fixed width file format. For more information on fixed width file formats checkout the following website. I followed these steps to get the data into a workable format:

  • Set the fixed widths into "length" list (provided in fixed width file readme)

https://gist.github.com/64001a3d842d6f78be9c19c912a13e5b

  • Read in fixed width file and convert to CSV file using pandas

https://gist.github.com/c34b5f543632fe0b1aa45641d1df6a6a

  • Load in CSV file without headers

https://gist.github.com/76fb9dd55b0b018bb1d59a2299b42386

  • Create and add column names (provided in fixed width file readme)

https://gist.github.com/8e65c02598e02f03cb23c8c32c352ea7

  • Drop unnecessary index column

https://gist.github.com/2da529da771169880bc7a764bb4613c4

Data Cleaning

Additionally, there was quite a bit of data cleaning involved, but I'll give you a short overview. I wanted to filter the Shapefile to just be the contiguous United States, so I need to filter out the following state codes:

  • 02: Alaska
  • 15: Hawaii
  • 60: American Samoa
  • 66: Guam
  • 69: Mariana Islands
  • 72: Puerto Rico
  • 78: Virgin Islands

https://gist.github.com/e6b67e11eb2b0efdd43672ad800ea1fe

Let's take a first look at the Shapefile:

https://gist.github.com/8639a9224fd4b026fe434dd8bde945bd

You can see all the counties in the contiguous United States.

Merging the Shapefile and Dataset

The Shapefile and the Dataset need to have a column in common in order to match the data to map. I decided to match by FIPS codes. To create the FIPS codes in the Shapefile:

https://gist.github.com/eb21ce43210614c3d8f8febdb8239fb1

To create the FIPS codes in the county data (Note: I filtered the data to only the year 2018 for simplicity):

https://gist.github.com/49ba2a2326bf9e34bdfc1581035581ee

Finally, to merge the Shapefile and Dataset:

https://gist.github.com/52d79e67654173bc404d47d55912c586

Mapping (Finally!)

Finally, we get to map the data to the Shapefile. I used the geoplot.choropleth method to map the maximum temperature data on a scale. The darker the red, the hotter the maximum temperature was for a given county. The map was created for August 2018.

https://gist.github.com/ad8496a84fc332b284cba0f637906f50

Yay!

You can see we were able to plot the data on the county map of the US! I hope this demonstration helps!

Problems

Unfortunately you can see there is missing data. Additionally, I was able to generate a legend, but it would show up as about twice the size of the map itself, so I decided to remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment