Public Use Microdata Areas (PUMA), are statistical geographic areas defined by the US Census. For certain types of analysis, it is useful to determine which PUMAs are adjacent to each other. The following instructions show how to download the PUMA boundaries and then use QGIS to calculate neighboring PUMAs, and export the results as a CSV file. This same technique could be used with other boundary polygons.
The US Census has PUMA boundary shapefiles available by state. Here's the directory of all the 2010 PUMAs, 2019 vintage: https://www2.census.gov/geo/tiger/TIGER2019/PUMA/
You can manually download all the files one at a time, or automate the process using wget:
wget -A zip -r -l1 ftp://ftp2.census.gov/geo/tiger/TIGER2019/PUMA/
About the wget command-line options used above:
-A
limits to specific file extensions
-r
means recursive
Before calculating neighboring PUMAs, we'll want to merge the separate state files into a single nationwide data file. Note that QGIS can read the zipfiles without us having to extract them first. (Although extracting is usually recommended if you are not immediately saving to another format.)
- In QGIS, under the Processing menu, open the Toolbox
- Search for "Merge vector layers" and double-click it
- For "Input layers", click the
...
button and "Add Directory", selecting the folder where you saved the zipfiles - Once you have see all the state files listed, click "OK"
- Leave the "Destination CRS" blank (it will use the CRS from the files)
- Leave "Merged" set to "[Create temporary layer]" so we can check the output before saving
- Click "Run"
After a few seconds, you should see the US on the map, and a new temporary layer called "Merged" in the list of layers.
If it looks good, right-click "Merged" > "Make Permanent..."
- GeoPackage (the default format) is a fine format -- just one file!
- For "File name", click the
...
to specify the output location and filename (something like 'puma2010.gpkg') - Ignore all the other options, and click "OK"
- Right-click "Merged" > "Rename layer" and rename it "PUMA"
The GEOID10 column contains the nationwide unique IDs for each PUMA. We want to add a new column that will contain a list of all the IDs of PUMAs that are adjacent to a given PUMA. There are several ways to do this in QGIS, but the fastest technique is to use the "Join attributes by location" processing tool.
- In QGIS, under the Processing menu, open the Toolbox.
- Search for "Join attributes by location" and double-click it
- Set "Base Layer" = "puma2010"
- Set "Join Layer" = "puma2010"
- "Geometric predicate" should be "intersects"
- For "Fields to add", click the
...
and select "GEOID10" - "Join type" should be "Create separate feature for each matching feature (one-to-many)"
- Leave the other settings as is, and click "Run"
Wait until processing is complete (about 50 seconds), and then you should see a new layer called "Joined Layer".
- Close the processing window
- Right-click the "Joined layer" > "Open Attribute Table"
- Click the "GEOID10" column header to sort by that column
You'll notice that the same record is repeated several times, but if you scroll to the right, you'll see a new column called "GEOID10_2", which has the ID of one of the neighboring PUMAs. The basic table structure is like this (omitting most of the other columns):
GEOID10 | ... | GEOID10_2 |
---|---|---|
1 | ... | 2 |
1 | ... | 3 |
1 | ... | 7 |
2 | ... | 1 |
2 | ... | 4 |
At this point, you could save this table as a CSV file with just the columns you need:
- Right-click "Joined layer" > Export > Save Features As
- Set "Format" = "Comma Separated Value (CSV)"
- For "File name", click
...
to specify the output location and filename (puma2010_neighbor_pairs.csv) - Uncheck any fields you do not want to include
- Skip the option options, and click "OK" at the bottom
It may be preferable to group together all the neighbors for each PUMA, like this:
GEOID10 | neighbors |
---|---|
1 | 2,3,7 |
2 | 1,4 |
We can restructure the data using a QGIS virtual layer:
- Layer menu > Create Layer > New Virtual Layer...
- Paste the following query:
select GEOID10, group_concat(geoid10_2) as neighbors
from "Joined layer"
group by GEOID10
- Under "Geometry", select "No geometry"
- Click "Add", then "Close"
A new table layer "virtual_layer" should appear. Right-click > Open Attribute Table to view it.
To save the table as a CSV file:
- Right-click "Joined layer" > Export > Save Features As
- Set "Format" = "Comma Separated Value (CSV)"
- For "File name", click
...
to specify the output location and filename (puma2010_neighbors.csv) - Skip the option options, and click "OK" at the bottom