At my previous job, I had to build maps quite often. There was never a particularly easy way to do this, so I decided to put my Python skills to the test to create a map. I ran into quite a few speed bumps along the way, but was eventually able to produce the map I intended to make. I believe with more practice, mapping in Python will become very easy. I originally stumbled across Geopandas and Geoplot for mapping, which I use here, however there are other Python libraries out there that produce nicer maps, such as Folium.
First, you have to decide what you would like to map and at what geographical level this information is at. I am interested in applying data science to environmental issues and sustainability, so I decided to take a look at some National Oceanic and Atmospheric Administration (NOAA) county level data for the United States. I specifically chose to look at maximum temperature by month for each county.
Second, you need to gather your data. From the NOAA climate division data website, I was able to pull the data I needed by clicking on the "nClimDiv" dataset link. After unzipping this data into a local folder I was ready to move on for now.
Third, you need to gather a proper Shapefile to plot your data. If you don't know what a Shapefile is, this link will help to explain their purpose. I was able to retrieve a United States county level Shapefile from the US Census TIGER/Line Shapefile Database. Download the proper dataset and store in the same local folder as the data you want to plot.
As mentioned above, I used the python libraries Geopandas and Geoplot. I additionally found that I needed the Descartes libraries installed as well. To install these libraries I had to run the following bash commands from my terminal: https://gist.github.com/0f32894b05548f9dc1627cfd293640ab
Now you will be able to import these libraries as you would with any other python library (e.g. "import pandas as pd"). To load in the Shapefile you can use the following Geopandas (gpd) method:
https://gist.github.com/916ee4d88192424fe27564ba294458f8
To load in the county level data, I had a few more problems to solve. The file came from NOAA in a fixed width file format. For more information on fixed width file formats checkout the following website. I followed these steps to get the data into a workable format:
- Set the fixed widths into "length" list (provided in fixed width file readme)
https://gist.github.com/64001a3d842d6f78be9c19c912a13e5b
- Read in fixed width file and convert to CSV file using pandas
https://gist.github.com/c34b5f543632fe0b1aa45641d1df6a6a
- Load in CSV file without headers
https://gist.github.com/76fb9dd55b0b018bb1d59a2299b42386
- Create and add column names (provided in fixed width file readme)
https://gist.github.com/8e65c02598e02f03cb23c8c32c352ea7
- Drop unnecessary index column
https://gist.github.com/2da529da771169880bc7a764bb4613c4
Additionally, there was quite a bit of data cleaning involved, but I'll give you a short overview. I wanted to filter the Shapefile to just be the contiguous United States, so I need to filter out the following state codes:
- 02: Alaska
- 15: Hawaii
- 60: American Samoa
- 66: Guam
- 69: Mariana Islands
- 72: Puerto Rico
- 78: Virgin Islands
https://gist.github.com/e6b67e11eb2b0efdd43672ad800ea1fe
Let's take a first look at the Shapefile:
https://gist.github.com/8639a9224fd4b026fe434dd8bde945bd
You can see all the counties in the contiguous United States.
The Shapefile and the Dataset need to have a column in common in order to match the data to map. I decided to match by FIPS codes. To create the FIPS codes in the Shapefile:
https://gist.github.com/eb21ce43210614c3d8f8febdb8239fb1
To create the FIPS codes in the county data (Note: I filtered the data to only the year 2018 for simplicity):
https://gist.github.com/49ba2a2326bf9e34bdfc1581035581ee
Finally, to merge the Shapefile and Dataset:
https://gist.github.com/52d79e67654173bc404d47d55912c586
Finally, we get to map the data to the Shapefile. I used the geoplot.choropleth method to map the maximum temperature data on a scale. The darker the red, the hotter the maximum temperature was for a given county. The map was created for August 2018.
https://gist.github.com/ad8496a84fc332b284cba0f637906f50
You can see we were able to plot the data on the county map of the US! I hope this demonstration helps!
Unfortunately you can see there is missing data. Additionally, I was able to generate a legend, but it would show up as about twice the size of the map itself, so I decided to remove it.