Skip to content

Instantly share code, notes, and snippets.

@aaronpdennis
Last active December 14, 2016 02:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save aaronpdennis/0c658c543f5340236bba to your computer and use it in GitHub Desktop.
Save aaronpdennis/0c658c543f5340236bba to your computer and use it in GitHub Desktop.
Processing Planet.osm and serving HDM vector tiles.

Building and Updating a Global HDM Vector Tile Source from OSM Data

The Mapbox Studio style hdm-style.tm2 is a humanitarian map design that relies on three vector tile sources, two of which are hosted and served by Mapbox:

  1. Mapbox Streets
  2. Mapbox Terrain
  3. Humanitarian Data Model vector tiles

The third source is the focus of this writeup. Currently, hdm-style.tm2 uses a hand-processed, non-updating snapshot of HDM vector tiles for small subregions of the world. This gist explains in rough psuedocode and theory how you could set up a HDM vector tiles source that would update hourly (or minutely or daily) from the OSM database and cover the entire planet.

Architecture

This section explains first my old solution for generating HDM vector tiles for subregions, and then explains how I would set up a global HDM vector tiles source that updates constantly.

Old Processing Pipeline

The current workflow to create snapshots of HDM vector tiles for parts of the world relies on parsing and filtering osm.pbf files with node-osmium to create large file dumps of GeoJSON features, then generating vector tiles from those files with Tippecanoe and uploading the output .mbtiles file to the Mapbox servers.

nepal-latest.osm.pbf  ==>  node-osmium script  ==>  GeoJSON features  ==>  tippecanoe  ==>  Mapbox Upload

The problem with this workflow is that generating constantly updating vector tiles at a global scale would require lots of computations on the entire planet HDM dataset every hour of every day.

Proposed Solution

A seemingly less computational and resource intensive solution would be to build and update a database of OSM humanitarian data and then set up a server that responds to requests for HDM vector tiles by generating only requested tiles from the database. A lot of the ideas below come from reading through the Mapbox GitHub account, and I expect Mapbox uses a similar architecture on their backend for vector tile sources.

The major components to this solution are:

  1. Processing OSM data and difference files with node-osmium
  2. Writing GeoJSON from processing to an existing AWS DynamoDB and S3 bucket with cardboard
  3. Using tilelive-cardboard to generate vector tiles from the cardboard database
  4. Serving those tiles from a TileJSON URL
(initial building of database)
planet.osm.pbf
 \\
  \\==>  node-osmium script  ==\\
                                \\
                                 ====>  cardboard{[ DynamoDB ]}  ==>  tilelive.js ==> server responding to zxy tile requests
                                //
  //==>  node-osmium script  ==//
 //
osm-diff-files.osc 
(updating database)

Our hdm-style.tm2 would then point to that TileJSON URL in its list of sources, alongside mapbox.mapbox-streets-v5 and mapbox.mapbox-terrain-v2.

Step 1: Building the Database

The database will be stored on AWS DynamoDB for small feature storage and AWS S3 for large feature storage. Processing of the OSM database .pbf files should probably occur on an AWS EC2 c3.8xlarge instance. The processing for the planet will probably require ~30 GB of memory.

Install the dependencies, node(*version 10), turf, and osmium. The current release of osmium will only run on version 10 of node. Also, configure your AWS credentials with $ export AWS_ACCESS_KEY_ID=<your-access-key-id> and $ export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>.

Next, we need to get either a recent pbf file version of the entire OSM planet file, or all of the planet split into pbf files for each region. I haven't tried this at the planet scale yet, but my first attempt would be to download all the continent pbf files from http://download.geofabrik.de/ and then process each one into our database, something like this from the EC2 command line:

wget http://download.geofabrik.de/africa-latest.osm.pbf
wget http://download.geofabrik.de/antarctica-latest.osm.pbf
wget http://download.geofabrik.de/asia-latest.osm.pbf
wget http://download.geofabrik.de/australia-oceania-latest.osm.pbf
wget http://download.geofabrik.de/central-america-latest.osm.pbf
wget http://download.geofabrik.de/europe-latest.osm.pbf
wget http://download.geofabrik.de/north-america-latest.osm.pbf
wget http://download.geofabrik.de/south-america-latest.osm.pbf

Then, process those pbf files with this script.

It will depend on you having set up an AWS DynamoDB that looks like this:

Table Name:          hdm-vt-table
Primary Hash Key:    dataset (String)
Primary Range Key:   id (String)

and an AWS S3 bucket that looks like this:

Bucket Name:         hdm-vt-bucket

Download the script and run it like this:

wget https://gist.githubusercontent.com/aaronpdennis/763ed66d7dbe8273ae57/raw/7d51960630edbc44ec2ec8905f05889af5516875/build-dataset.js
node build-dataset.js \
      ./africa-latest.osm.pbf \
      ./antarctica-latest.osm.pbf \
      ./asia-latest.osm.pbf \
      ./australia-oceania-latest.osm.pbf \
      ./central-america-latest.osm.pbf \
      ./europe-latest.osm.pbf \
      ./north-america-latest.osm.pbf \
      ./south-america-latest.osm.pbf

Note that some of the geofabrik regions overlap, but it should only be minor and the data would just be overwritten for each based on the OSM id. The bigger problem might be some Mid-Atlantic regions that are missed (if there's any data there). A planet-latest.osm.pbf file would perhaps work as well and is worth a try.

This should give us the latest OSM data we need for HDM vector tiles stored on AWS and usable through a cardboard interface.

Step 2: Updating the Database

Note: I haven't tried any of this. Here begins the psuedocode.

In a similar spirit as the Mapbox Streets vector tile source, we want our HDM vector tiles to update in real-time to quickly incorporate the efforts of OSM contributors. To do this, we should download and process OSM difference files to update the database we built in Step 1.

Updating the database should be happening starting immediately after whenever the planet.osm.pbf file we used was generated. Planet.osm/diffs are generated from OSM either once a minute, once an hour, or once a day, depending on how frequently you want to update your database. For the HDM vector tiles, it might be good to process 5 minute diff files at once on a smaller EC2 instance to avoid needing to process a lot of data at once and still give a frequently updating map without constantly having to download and process a file every 60 seconds.

To implement an updating database, the OSM diff .osc files should be downloaded and then processed through a node-osmium script similar to the script in Step 1, but should now check each node's visible property (assigned by node-osmium refering to whether the change was in the node/way's properties/geometries or the node/way was deleted) and applies this logic:

if( node.visible or way.visible ) {
  continue processing node and write the GeoJSON to the cardboard database
} else {
  search for the osm id of the node or way in the cardboard database and remove that feature.
}

The script in Step 1 assigns features in the database IDs following the same conventions as Mapbox Streets. Use these IDs to access features in the database:

Original OSM object Vector tile geometry ID transform Example
node point original + 1015 123 → 1000000000000123
way line none 123 → 123
way polygon, point original + 1012 123 → 1000000000123

Successfully implemented, this would provide a global, updating datasource from which we can serve up HDM vector tiles.

Step 3: Generating and Serving Vector Tiles

Now having a database, we need to create vector tiles and provide them to applications that want to use HDM Vector Tiles. Instead of processing all of OpenStreetMap into vector tiles every time an update comes through, we'll set up a server that responds to requests for specific tiles by generating the tile from the database and then sending the generated tiles in response to the request.

The node module tilelive-cardboard will do the job of implementing tilelive.js to generate HDM Vector Tiles from the cardboard interface to our DynamoDB and S3.

We would need to set up a server that listens to requests that would come in the form of something like tiles/{z}/{x}/{y} to a url. The server would parse the request to identify which tile is requested based on the "z,x,y" scheme. Here is an example of setting up such a tile server with express.js and pulling vector tiles from an MBTiles file. For our purposes, we would replace the MBTiles file with tilelive-cardboard implemented around our earlier cardboard interface to to the database.

Step 4: Creating a TileJSON Endpoint

Applications like Mapbox Studio, hosted .tm2 projects, and Mapbox GL can use vector tiles from a TileJSON URL. This is a JSON file hosted on the web that describes a vector tileset with information such as a description, attribution, and a tiles/{z}/{x}/{y} "tiles" property that specifies a location where the tiles can be found. In our case, the "tiles" property should point to wherever we're serving the HDM Vector Tiles. Below is what the hdm-tiles.json TileJSON document might look like for HDM Vector Tiles, as derived from our Humanitarian Data Model.

{  
  "tilejson": "2.1.0",
  "name": "Humanitarian Data Model",
  "description": "Humanitarian data from OpenStreetMap in vector tile format."
  "attribution": "<a href=\"https://americanredcross.github.io\" target=\"_blank\">&copy; American Red Cross</a> <a href=\"http://www.openstreetmap.org/about/\" target=\"_blank\">&copy; OpenStreetMap</a> <a class=\"mapbox-improve-map\" href=\"https://www.mapbox.com/map-feedback/\" target=\"_blank\">Improve this map</a>",
  "scheme": "xyz",
  "tiles": [
    "http://americanredcross.github.io/hdm-vector-tiles/{z}/{x}/{y}.vector.pbf",
  ],
  "minzoom":0,
  "maxzoom":14,
  "bounds": [
    -180,
    -85.0511,
    180,
    85.0511
  ],
  "center": [
    0,
    0,
    0
  ],
  "vector_layers": [
    {
      "description": "",
      "fields": {
        "class": "One of: shelter, camp, residential, common, rubble, landslide",
        "osm_id": "Unique OSM ID number"
      },
      "id": "site",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "class": "One of: surface, condition, barrier",
        "osm_id": "Unique OSM ID number",
        "surface": "One of: unpaved, rough",
        "condition": "One of: bad, impassable, collapsed",
        "barrier": "One of: checkpoint, debris, gate"
      },
      "id": "road_condition",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "osm_id": "Unique OSM ID number"
        "class": "One of: construction, damaged, collapsed, flooded"
      },
      "id": "building_condition",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "osm_id": "Unique OSM ID number",
        "class": "One of: medical rescue, fire fighter, lifeguarding, assembly point, access point, emergency phone, emergency siren, helicopter"
      },
      "id": "emergency",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "class": "One of: generator, transmission, distribution",
        "power_source": "One of: wind, solar, hydro, gas, coal, biomass, nuclear",
        "structure": "One of: tower, pole, line",
        "osm_id": "Unique OSM ID number"
      },
      "id": "electric_utility",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "osm_id": "Unique OSM ID number",
        "class": "One of: tower, dish, notice board"
      },
      "id": "communication",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "class": "One of: shower, toilet, waste",
        "osm_id": "Unique OSM ID number"
      },
      "id": "sanitation",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "class": "One of: hospital, clinic, field hospital, pharmacy, dispensary, morgue",
        "osm_id": "Unique OSM ID number",
      },
      "id": "medical",
      "source": "americanredcross.humanitarian-data-model-v1"
    },
    {
      "description": "",
      "fields": {
        "class": "One of: water well, water tower, water tank, spring, drinking water",
        "potable": "One of: yes, no",
        "pump": "One of: yes, manual, powered"
      },
      "id": "water_source",
      "source": "americanredcross.humanitarian-data-model-v1"
    }
  ]
}

Put it all together, and you could have a solution for serving updating and global HDM vector tiles.

@aaronpdennis
Copy link
Author

Revisited this and realized it's not possible to update a Cardboard DynamoDB database from OSM Diffs because the .osc OSM change files assume you know the coordinates of all the nodes inside modified ways. That's a block to this workflow.

Also, I think I would modify this to use tilelive-cardboard with an Amazon AWS Lambda functions and a AWS Gateway REST API for more cost efficiency and better scalability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment