Skip to content

Instantly share code, notes, and snippets.

@drwelby
Last active May 1, 2023 14:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drwelby/95fa4032d723ff8cb2803eea22899b36 to your computer and use it in GitHub Desktop.
Save drwelby/95fa4032d723ff8cb2803eea22899b36 to your computer and use it in GitHub Desktop.

Maxar Open Data GeoParquet STAC Catalog

GeoParquet is an experimental standard for storing geospatial data in the Parquet format. Because Parquet's columnar architecture allow for efficient reads over HTTP it's considered to be an initial attempt at a "Cloud-optimized" vector format.

While it lacks the ability to do optimized spatial reads, since it can filter on a small fraction of a features fields it fits will with storing and querying STAC Items. A STAC Item inherits from GeoJSON so has a spatial component, but also can store a large number of metadata field, many of which may be redundant and rarely useful to query. GeoParquet lets us query for features with simple filter requirements like "all images with a low cloud cover percentage".

The first attempt at converting the full Open Data Catalog to GeoParquet is at:

s3://maxar-opendata/events/maxar-opendata.parquet

or

https://maxar-opendata.s3.amazonaws.com/events/maxar-opendata.parquet

Reading with OGR

GDAL/OGR can now read Geoparquet and can run SQL so you can query it much like a STAC API.

Note: Some fields can't be handled by OGR so you'll get some warnings

Get all the tiles covering Turkey with low cloud cover

ogr2ogr turkey_earthquake.geojson  /vsis3/maxar-opendata/events/maxar-opendata.parquet -f GeoJSON  \
   -sql 'SELECT * FROM maxar_opendata WHERE "tile:clouds_percent" < 10' \
   -spat 35 35 38 38

DuckDB

DuckDB gives you a fast SQL engine on top of Parquet files. While it doesn't "officially" handle spatial queries yet (https://github.com/duckdblabs/duckdb_spatial) we can use the quadkey identifiers and the LIKE operator as a simple spatial query system. The zone and quadkey "36/120022" roughly covers the Turkey earthquake area.

To read from S3, first install the httpfs extension:

load httpfs;

Get the geotiff URLs for the Turkey earthquake with low cloud cover:

load httpfs;
SELECT assets.visual.href 
  FROM read_parquet('https://maxar-opendata.s3.amazonaws.com/events/maxar-opendata.parquet')
  WHERE 
     "id" like '36/120022%'
     AND "tile:clouds_percent" < 10;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment