Skip to content

Instantly share code, notes, and snippets.

@danabauer
Last active February 29, 2024 17:28
Show Gist options
  • Save danabauer/304c41b477781324a5469782f8335669 to your computer and use it in GitHub Desktop.
Save danabauer/304c41b477781324a5469782f8335669 to your computer and use it in GitHub Desktop.
Dana's recap of the Overture BOF at Mapping USA in January 2024

Hey folks! Yesterday I listened to the Overture Maps Birds of a Feather session at Mapping USA and typed up some quick impressions. Jennings led the session and Ben Clark and Marc participated. It was a good discussion that drew lots of interesting questions, and I wanted to share those here.


Jennings spoke during the second day of Mapping USA, a free, virtual conference organized by OpenStreetMap US. There were about 40 people listening on Zoom, a mix of OpenStreetMap members and contributors as well as other folks with a more casual interest in OSM. Jennings stepped through slides, but the session was very interactive, and people asked questions throughout. A different audience might not care so much about how the sausage is made, but this group wanted to hear how Overture assembles its data products and how OSM data flows into Overture’s processing pipelines.

Throughout the discussion, Jennings hit on three main selling points:

  • Overture is doing the hard work of data conflation for you.
  • Overture is tackling one of the most difficult problems in geospatial — data interoperability.
  • Overture is committed to releasing data in the most developer-friendly formats.

On point 1, he outlined the broad strokes of the data conflation process, using the recently-released buildings data as an example. Jennings emphasized that Overture starts with data created by people, then fill in the gaps with data generated from machine learning models. More specifically, the flow of data into the pipeline goes like: OSM, Esri, Google, Microsoft, then Google again, at each step referring back to OSM data “as a single source of buildings truth.” What comes out of the pipeline is data with a fixed and easy-to-use schema. The process is different for the Overture’s transportation data, where OSM data is currently the only source. Jennings said to think of it as “one step of processing” beyond raw OSM so that developers don’t have to deal OSM-specific things like relations. Later in the session, he clarified that there is another step behind the scenes: raw OSM data goes through Meta’s Daylight validation tooling before it hits Overture’s pipelines.

Some questions from the audience:

  • What’s the difference between raw OSM data, Daylight data, and Overture data?
  • What does Meta’s Daylight validation tooling do?
  • Is there separate conflation logic for different Overture data themes? Will that logic and the order of data in the pipeline change?
  • Is Overture discarding features and attributes as it compares and combines data sources?

On selling point 2, Jennings talked about Overture’s use of Global Entity Reference System to conquer interoperability and bring disparate data sets together. The audience seemed to easily grasp the idea of a stable, unique identifier for each feature, but they had some pointed questions about the process of keying data to GERS:

  • What do you define as a stable entity?
  • At what step in the Overture pipeline is a feature/entity assigned a GERS id?
  • And two very OSM-specific questions: “Is there an expected publication of any metadata (such as business rules) that would facilitate the conversion between OSM tagging and the Overture schema format, and vice versa? In other words, if we wish to create our own conversion code…” and “Is data from Overture going to be integrated into Rapid? Do you plan to add an editing experience to Overture data?” (In the middle of asking this question, the audience member seemed to realize they could grab an Overture data extract in geojson or a link to pmtiles made from Overture data and run with it in Rapid.)

About 15 minutes into his slides, Jennings gave live demos of Overture’s buildings data in fused.io and a kepler.gl, helping the audience visualize the different sources of data and giving an example of how ML data is nicely filling in gaps in OSM data in India. The demos made it easy to bring up selling point 3: the super developer-friendly way Overture is offering its data. Jennings talked about making the decision to release data in geoparquet, a cloud-native format, so that people could explore and use the data in situ without downloading it. “This is a new way for us to work,” he said. And if developers do want work locally, tools like duckdb make it easy to download only what they need.

Jennings emphasized that Overture’s goal is to make geospatial data accessible to developers so they can run with it and build what they want. Last year, when Overture released its first data set, people converted it to geoparquet and put it on Source Cooperative. “That was a win for everyone. That’s exactly what we wanted to see,” he said. “We learned from our developer community.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment