Skip to content

Instantly share code, notes, and snippets.

@danabauer
Last active March 11, 2024 18:41
Show Gist options
  • Save danabauer/893bd9ce6fa1df8b60888981149ba64e to your computer and use it in GitHub Desktop.
Save danabauer/893bd9ce6fa1df8b60888981149ba64e to your computer and use it in GitHub Desktop.
down the rabbit hole of Overture's admins dataset, 2024-02-15 data release

This query lets us grab the locality (point geometry and attributes) that represents the United States and its associated area, in this case a multipolygon geometry.

CREATE VIEW admins_view AS (
    SELECT
        *
    FROM
        read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)COPY (
COPY (
SELECT
            admins.id,
            admins.subType,
            admins.isoCountryCodeAlpha2,
            names.primary AS primary_name,
            sources[1].dataset AS primary_source,
            areas.areaId,
            ST_GeomFromWKB(areas.areaGeometry) as geometry
    FROM admins_view AS admins
    INNER JOIN (
        SELECT
            id as areaId,
            localityId,
            geometry AS areaGeometry
        FROM admins_view
    ) AS areas ON areas.localityId == admins.id
    WHERE admins.adminLevel = 1
    AND admins.isoCountryCodeAlpha2 = 'US'
    ) TO 'us-boundary.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');

Would something like this work to get all the buildings within that multipolygon area for the United States?

CREATE VIEW buildings_view AS (
    SELECT
        *
    FROM
        read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=buildings/type=*/*', filename=true, hive_partitioning=1)
);

SELECT 
    admins.id, 
    admins.subType, 
    admins.isoCountryCodeAlpha2,
    admins.names.primary,
    sources[1].dataset AS primary_source,
    --areas.areaId,
    buildings.id,
    buildings.names.primary AS building_name,
    ST_GeomFromWKB(areas.areaGeometry) as geometry
FROM admins_view as admins
    
INNER JOIN (
    buildings_view as buildings ON ST_Intersects(ST_GeomFromWKB(areas.areaGeometry), ST_GeomFromWKB(buildings.geometry))

WHERE
    admins.adminLevel = 1
    AND admins.names.primary LIKE 'United States'

LIMIT 10;

Get a list and count of each locality type in the Overture admins dataset.

SELECT 
    adminLevel, localityType, COUNT(1) n
FROM 
    read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)
WHERE 
    localityType IS NOT NULL
GROUP BY adminLevel, localityType
ORDER BY n DESC;

See the output table below. Why are so many locality types missing an admin level? Also seems like there are lots of incorrect assignments. Create task to investigate.

adminLevel localityType n
hamlet 1634526
village 1423009
neighborhood 790512
suburb 153593
town 110569
3 county 34763
city 15825
4 region 8255
4 village 6338
3 town 4618
borough 4089
2 state 3357
municipality 2455
4 district 1652
3 city 1144
3 municipality 1038
4 county 1021
4 town 733
county 731
4 city 444
district 414
3 district 412
3 region 373
1 country 281
3 province 201
2 province 143
2 county 142
3 state 87
3 borough 67
2 region 61
2 district 56
2 municipality 25
4 suburb 12
4 hamlet 6
4 municipality 3
state 2
region 2
4 neighborhood 2

Find the features at adminlevel = 2 (state level) that match the name Pennsylvania. Grab the GERS ID for PA.

SELECT id, isoSubCountryCode, contextId, names.primary
  FROM 
    read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)
   WHERE type='locality'
   AND subType='administrativeLocality'
   AND adminLevel=2
   AND names.primary LIKE 'Pennsylvania';

Now we have the GERS ID for Pennsylvania and its context ID (aka its parent ID), which is also the GERS ID for the United States.

id isoSubCountryCode contextId primary
varchar varchar varchar varchar
0857b2b73fffffff01914c596abbfacd US-PA 0850c861bfffffff01aeb407d56d3441 Pennsylvania

Get all the children associated with Pennsylvania's GERS ID.

CREATE VIEW admins_view AS (
    SELECT
        *
    FROM
        read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)
);

COPY (
  WITH RECURSIVE hierarchy(id, adminLevel, localityType, contextId, names) AS (
      SELECT id, adminLevel, localityType, contextId, names.primary as names
      FROM 
        admins_view
      WHERE 
        type = 'locality'
      AND id = '0857b2b73fffffff01914c596abbfacd'
     
 
      UNION ALL
 
      SELECT admins.id, admins.adminLevel, admins.localityType, admins.contextId, admins.names.primary as names
      FROM admins_view AS admins
      JOIN hierarchy
      ON admins.contextId = hierarchy.id
      WHERE 
        type = 'locality'
      )
  
  SELECT *
  FROM hierarchy
  ORDER BY adminLevel, localityType, id
) TO 'beautiful_pennsylvania.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');

From that resulting GeoJSON, I pulled out the GERS ID for Fishtown, a neighborhood in Philadelphia: 08522a98bfffffff01ebb35b247d2da3. Let's walk it up to the root.

   CREATE VIEW admins_view AS (
     SELECT
        *
     FROM
        read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)
);

COPY (
   WITH RECURSIVE hierarchy(id, adminLevel, localityType,
                         contextId, names, geometry,
                         isoCountryCodeAlpha2, isoSubCountryCode, sources
   ) AS (
  SELECT id, adminLevel, localityType, contextId, names, geometry, isoCountryCodeAlpha2, isoSubCountryCode, sources
    FROM admins_view
   WHERE 
     type = 'locality'
     AND id = '08522a98bfffffff01ebb35b247d2da3'
 
   UNION ALL
 
  SELECT admins.id, admins.adminLevel, admins.localityType,
         admins.contextId, admins.names, admins.geometry,
         admins.isoCountryCodeAlpha2, admins.isoSubCountryCode, admins.sources
    FROM admins_view AS admins
    JOIN hierarchy
      ON admins.id = hierarchy.contextId
   WHERE 
     type = 'locality'
)
  SELECT 
         id, 
         adminLevel, 
         localityType,
         names.primary AS name,
         isoCountryCodeAlpha2 AS country, 
         isoSubCountryCode AS subCountry,
         ST_GeomFromWKB(geometry) as geometry,
         contextId AS parentId,
         sources[1].dataset AS source
    FROM hierarchy
ORDER BY adminLevel, localityType, id
) TO 'fishtown-to-us.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');

Here's a table of each administrative unit from neighborhood to country, with attributes. The GeoJSON file contains the geometries.

gers_id locality_type name parent_id source
0850c861bfffffff01aeb407d56d3441 country United States OpenStreetMap
0857b2b73fffffff01914c596abbfacd state Pennsylvania 0850c861bfffffff01aeb407d56d3441 OpenStreetMap
08522114ffffffff01a739d1d7602762 county Philadelphia County 0857b2b73fffffff01914c596abbfacd OpenStreetMap
08522f093fffffff013db4adcc60888e city Philadelphia 08522114ffffffff01a739d1d7602762 OpenStreetMap
08522a98bfffffff01ebb35b247d2da3 neighborhood Fishtown 08522f093fffffff013db4adcc60888e OpenStreetMap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment