Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save dannguyen/dbcf95d4d44641f4b8887556c47d5892 to your computer and use it in GitHub Desktop.
Save dannguyen/dbcf95d4d44641f4b8887556c47d5892 to your computer and use it in GitHub Desktop.
PostGIS example (in Carto, i.e. Postgres) of joining Census tract data to a larger entity (Baltimore Police Dept. Districts) and doing a group count. Solves a classic GIS problem -- the Baltimore Census tract shapefiles have demographic/pop info, but the Baltimore police district shapes do not.

The problem

One shapefile has data -- Census demographics population per census tract. But we want to use another type of shape -- police district boundaries -- and count how many people live within those districts.

This requires using a spatial join to associate all the little shapes (Census tracts) with their containing shape (police district)

The data

The police districts data table, which has only shape data and no population info:

https://dun2.carto.com/tables/police_districts

It looks like this when mapped:

image

The Census tracts data table, which contains shape data AND population info:

https://dun2.carto.com/dataset/census_tracts

Which looks like this:

image

When these shapefiles are joined, using query.sql (included below), we get a table of police districts (9 rows) and the aggregated demographic/pop data of their respective census tracts.

Link to Carto dataset that results from this: https://dun2.carto.com/dataset/bpd_districts_and_census

Note that it contains no geospatial data, it's just a table of census numbers, but there is a field for district_name and district_number, which means you can join this table (bpd_districts_and_census) with the police_districts (which does have shapefile info for the district boundaries) to make a proper map.


About the JOIN and ST_CENTROID

The JOIN condition:

FROM 
  census_tracts AS ct
INNER JOIN 
  police_districts AS pd
      ON ST_CONTAINS(
            pd.the_geom,
            ST_Centroid(ct.the_geom)
      )

This works by reducing all the Census Tracts to just their centroids, i.e. a single point, which makes it much easier for them to be "inside" or "contained" by the geometry of the respective police district.

You can read about the ST_CENTROID method here: https://postgis.net/docs/ST_Centroid.html

The reason why ST_CONTAINS (or ST_WITHIN) doesn't work with shapefiles is because the boundaries of the census tracts may not exactly fall within or overlap the containing police district:

FROM 
  census_tracts AS ct
INNER JOIN 
  police_districts AS pd
      ON ST_CONTAINS(
            pd.the_geom,
            ct.the_geom
      )

The above JOIN gets you something like this (the purple shapes represent the only tracts that are properly joined):

image

SELECT
pd.dist_name AS district_name
, pd.district AS district_number
, SUM(ct.population) AS population
, SUM(ct.white) AS white
, SUM(ct.blk_afam) AS black
, SUM(ct.asian) AS asian
, SUM(ct.hisp_lat) AS hispanic
FROM
census_tracts AS ct
INNER JOIN
police_districts AS pd
ON ST_CONTAINS(
pd.the_geom,
ST_Centroid(ct.the_geom)
)
GROUP BY district_name, district_number
ORDER BY district_number ASC;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment