Skip to content

Instantly share code, notes, and snippets.

@DenisCarriere
Last active September 22, 2017 23:58
Show Gist options
  • Save DenisCarriere/53237c86ba336bda820ae72ce76f8804 to your computer and use it in GitHub Desktop.
Save DenisCarriere/53237c86ba336bda820ae72ce76f8804 to your computer and use it in GitHub Desktop.
OSM Statistic Tiles

OSM Statistics Tiles

The goal of this dataset/project would be to generate geospatial statistics based on OpenStreetMap data and stored as JSON and PBF formats. These pre-computed tiles would enable data scientists to perform complex geospatial analysis at a country wide scale with minimal CPU processing.

Geospatial statitics

There could be thousands of different geospatial statitics that could be generated from a simple OSM dataset within a given BBox, here's a list of the most useful geospatial statitic that would be defined:

  • highway=* (path, residential, secondary, etc...)
    • length (total length calculated in kilometers)
    • nodes (number of single vertices)
    • features (each individual feature)
  • landuse=* (residential, industrial, commercial, etc...)
    • sqkm area
    • nodes
    • features
  • natural=* (wood, water, grass, sand, etc...)
    • sqkm area
    • nodes
    • features
  • amenity=* (school, bar, college, etc...)
    • sqkm area
    • nodes
    • features

Simplified JSON Example

{
  "tile": [655, 1582, 12],
  "quadtree": "023010203331",
  "highway": {
    "length": 13.23,
    "path": {
      "length": 11,
      "nodes": 10,
      "features": 2
    },
    "residential": {
      "length": 2.23,
      "nodes": 15,
      "features": 3
    },
  },
  "landuse": {
    "area": 0.4,
    "residential": {
      "area": 0.3,
      "commercial": 0.1,
      "nodes": 300,
      "features": 1
    }
  }
}

Pattern detection

Geospatial processing can be a very CPU intensive task and can drastically slow down your geospatial analysis to a crawl if done incorrectly. Using this approach, one could simply focus on the aggregated geospatial properties without having to process all the geometry everytime they run their analysis, last minute tweaking of an algorithm would now take seconds vs. hours of re-processing. Using these aggregated statitics you easily start defining patterns/signatures per Tiles and allow a software to recognize this same pattern for the entire country, similar use cases would be to train an AI system.

Vandalism/error detection examples

  • Pokemon Go (Lakes, Paths, Parks all in one small residential area).

image

  • High volume of no-ortho'ed buildings (typically from new users not using Q or S to ortho buildings).

image

  • High % of overlapping buildings (poor import or double import)

image

Zoom Levels

Multiple OSM statitics tiles shall be created for each zoom level starting from Zoom 12, each zoom level narrows BBox of the dataset by 4x. Having the ability to use multiple tiles at different zoom levels would be beneficial, this would allow to quickly eliminate certain tiles at lower zoom levels if they do not meet the certain minimum requirements of it's parent tile.

Zoom 12

image

Zoom 13

image

Zoom 14

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment