What are the Daylight OpenStreetMap Parquet Files?
Listed on the registry of Open Data on AWS, the Daylight OpenStreetMap Parquet files contain the latest Daylight Map Distribution of OpenStreetMap in an analysis-ready format. This dataset is optimized for cloud-based queries with Amazon Athena, meaning anyone can access the entire dataset with SQL queries in the browser, without the need to download or access the files directly.
The Daylight Map Distribution of OpenStreetMap is always openly available for download in the standard OSM PBF format (find it at daylightmap.org). The parquet files, however, were first made available alongside Daylight release v1.9. They contain fully resolved geometries and additional metadata including areas, lengths, and quadkeys, not present in the PBF.
In total, the OSM features files contain nearly 1B features including 178M+ nodes, 816M+ ways, and 5M+ relations. These are all renderable features, meaning they have geometries that can be rendered on a map. The OSM features files therefore do not include untagged nodes or relations without basic geometries (such as turn restrictions). Untagged objects can also be found in this analysis-ready format in the OSM elements files, described in the comments.
Setting Up The
CREATE TABLE command will add the
daylight_osm_features table to your AWS data (i.e. Glue) catalog. For best performance, make sure your Athena console is running in the same region as the
CREATE EXTERNAL TABLE `daylight_osm_features`( `id` bigint, `version` int, `changeset` bigint, `created_at` timestamp, `tags` map<string,string>, `wkt` string, `min_lon` double, `max_lon` double, `min_lat` double, `max_lat` double, `quadkey` string, `linear_meters` double, `square_meters` double) PARTITIONED BY ( `release` string, `type` string) STORED AS PARQUET LOCATION 's3://daylight-openstreetmap/parquet/osm_features/'
Next, load the partitions by running:
MSCK REPAIR TABLE daylight_osm_features.
Remember to use
WHERE release = to query only one dataset (such as
v1.9). These are the release numbers of the Daylight Map Distribution. Omitting this
WHERE clause will result in querying duplicate features.
1. Total kilometers of different highway features in OSM?
SELECT tags [ 'highway' ] AS highway_tag, count(id) AS osm_features, count(distinct(changeset)) AS total_changesets, sum(linear_meters) / 1000 AS total_km FROM daylight_osm_features WHERE linear_meters > 0 AND tags['highway'] IS NOT NULL AND release='v1.9' GROUP BY tags [ 'highway' ] ORDER BY total_km DESC
2. Extract point geometries for all hospital features in OSM?
SELECT tags['name'] as name, ST_CENTROID(ST_GEOMETRYFROMTEXT(wkt)) as center, tags['amenity'] as amenity FROM daylight_osm_features WHERE tags['amenity'] LIKE 'hospital' AND release='v1.9'