Instantly share code, notes, and snippets.

Embed
What would you like to do?
A Python Protocol for Geospatial Data

Author: Sean Gillies Version: 1.0

Abstract

This document describes a GeoJSON-like protocol for geo-spatial (GIS) vector data.

Introduction

Python has a number of built-in protocols (descriptors, iterators, etc). A very simple and familiar one involves string representations of objects. The built-in str() function calls the __str__() method of its single argument. By implementing __str__(), instances of any class can be printed by any other Python program.

>>> class A(object):
...     def __str__(self):
...         return "Eh!"
...
>>> a = A()
>>> str(a)
'Eh!'
>>> "%s" % a
'Eh!'

What if we could do something like this for geo-spatial objects? It might, for example, let any object be analyzed using any other hypothetical software package like this:

>>> from some_analytic_module import as_geometry
>>> as_geometry(obj).buffer(1.0).area   # obj is a "point" of some kind
3.1365484905459389

The hypothetical as_geometry() function of the hypothetical some_analytic_module module would access relevant data of its single argument using an agreed upon method or attribute.

__geo_interface__

Following the lead of numpy's Array Interface [1], let's agree on a __geo_interface__ property. To avoid creating even more protocols, let's make the value of this attribute a Python mapping. To further minimize invention, let's borrow from the GeoJSON format [2] for the structure of this mapping.

The keys are:

type (required)
A string indicating the geospatial type. Possible values are "Feature" or a geometry type: "Point", "LineString", "Polygon", etc.
bbox (optional)
A tuple of floats that describes the geo-spatial bounds of the object: (left, bottom, right, top) or (west, south, east, north).
properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.
geometry (optional)
The geometric object of a "Feature" type, also as a mapping.
coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

Examples

First, a toy class with a point representation:

>>> class Pointy(object):
...     __geo_interface__ = {'type': 'Point', 'coordinates': (0.0, 0.0)}
...
>>> as_geometry(Pointy()).buffer(1.0).area
3.1365484905459389

Next, a toy class with a feature representation:

>>> class Placemark(object):
...     __geo_interface__ = {
...         'type': 'Feature',
...         'properties': {'name': 'Phoo'},
...         'geometry': Pointy.__geo_interface__ }
>>> from my_analytic_module import as_feature
>>> as_feature(Placemark())['properties']['name']
'Phoo'

Implementations

Python programs and packages that you have heard of – and made be a frequent user of – already implement this protocol:

Shapely

Shapely [7] provides a shape() function that makes Shapely geometries from objects that provide __geo_interface__ and a mapping() function that writes geometries out as dictionaries:

>>> from shapely.geometry import Point
>>> from shapely.geometry import mapping, shape
>>> Point(0.0, 0.0).__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
>>> shape(Point(0.0, 0.0))
<shapely.geometry.point.Point object at 0x...>
>>> mapping(Point(0.0, 0.0))
{'type': 'Point', 'coordinates': (0.0, 0.0)}

The Shapely version of the example in the introduction is:

>>> from shapely.geometry import shape
>>> shape(obj).buffer(1.0).area
3.1365484905459389

where obj could be a geometry object from ArcPy or PySAL, or even a mapping directly:

>>> shape({'type': 'Point', 'coordinates': (0.0, 0.0)}).buffer(1.0).area
3.1365484905459389

References

[1]http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
[2]https://tools.ietf.org/html/rfc7946
[3]https://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy-functions/asshape.htm
[4]https://bitbucket.org/sgillies/descartes/src/f97e54f3b8d4/descartes/patch.py#cl-14
[5]http://pypi.python.org/pypi/geojson/
[6]https://pysal.readthedocs.io/en/latest/users/tutorials/shapely.html
[7]https://github.com/Toblerity/Shapely
@NathanW2

This comment has been minimized.

Show comment
Hide comment
@NathanW2

NathanW2 May 21, 2013

Pretty cool. Would be good to implement this into pyqgis too.

NathanW2 commented May 21, 2013

Pretty cool. Would be good to implement this into pyqgis too.

@springmeyer

This comment has been minimized.

Show comment
Hide comment
@springmeyer

springmeyer Sep 18, 2013

Just added to mapnik as well: mapnik/mapnik#2009

springmeyer commented Sep 18, 2013

Just added to mapnik as well: mapnik/mapnik#2009

@cleder

This comment has been minimized.

Show comment
Hide comment
@cleder

This comment has been minimized.

Show comment
Hide comment

cleder commented Oct 25, 2013

@jzmiller1

This comment has been minimized.

Show comment
Hide comment
@jzmiller1

jzmiller1 Jul 12, 2014

Should geo_interface have an optional crs key?

jzmiller1 commented Jul 12, 2014

Should geo_interface have an optional crs key?

@perrygeo

This comment has been minimized.

Show comment
Hide comment
@perrygeo

perrygeo Jul 13, 2014

Since we already support geometries and features, let's go all the way and optionally allow representation of the complete GeoJSON hierarchy including FeatureCollections. Any objections from current users?

perrygeo commented Jul 13, 2014

Since we already support geometries and features, let's go all the way and optionally allow representation of the complete GeoJSON hierarchy including FeatureCollections. Any objections from current users?

@ayanmukg

This comment has been minimized.

Show comment
Hide comment
@ayanmukg

ayanmukg Nov 7, 2014

hi.. if the point is in lat,long and i want the buffer to be in meters or kilometers, is there a way to implement that?

ayanmukg commented Nov 7, 2014

hi.. if the point is in lat,long and i want the buffer to be in meters or kilometers, is there a way to implement that?

@jzmiller1

This comment has been minimized.

Show comment
Hide comment
@jzmiller1

jzmiller1 Jan 20, 2015

I am not sure if anyone is still looking at this but should geo_interface have an optional crs key? I am curious if it was omitted for a reason or was just looked over.

jzmiller1 commented Jan 20, 2015

I am not sure if anyone is still looking at this but should geo_interface have an optional crs key? I am curious if it was omitted for a reason or was just looked over.

@aronbierbaum

This comment has been minimized.

Show comment
Hide comment
@aronbierbaum

aronbierbaum Feb 19, 2015

@sgillies: Shouldn't the coordinates returned from __geo_interface__ be a list instead of a tuple to conform to the GeoJSON spec?

aronbierbaum commented Feb 19, 2015

@sgillies: Shouldn't the coordinates returned from __geo_interface__ be a list instead of a tuple to conform to the GeoJSON spec?

@aolieman

This comment has been minimized.

Show comment
Hide comment
@aolieman

aolieman May 8, 2015

@aronbierbaum, I don't think so. Coordinate pairs don't benefit from being stored in a mutable python type, and a tuple is an efficient choice for what we want to represent here. JSON is a serialization format, and as such is inherently immutable.

I guess what you've overlooked here is that __geo_interface__ specifies an interface, not a serialization format. The geojson package provides a way to serialize __geo_interface__ values to GeoJSON (see encoding/decoding).

aolieman commented May 8, 2015

@aronbierbaum, I don't think so. Coordinate pairs don't benefit from being stored in a mutable python type, and a tuple is an efficient choice for what we want to represent here. JSON is a serialization format, and as such is inherently immutable.

I guess what you've overlooked here is that __geo_interface__ specifies an interface, not a serialization format. The geojson package provides a way to serialize __geo_interface__ values to GeoJSON (see encoding/decoding).

@micahcochran

This comment has been minimized.

Show comment
Hide comment
@micahcochran

micahcochran Sep 20, 2015

@jzmiller1 I think that __geo_interface__ should have an optional crs key. However, specifying the format could be a little problematic. I have some ideas on how to do it, but I feel that a post might be a little too limited. I am creating a document to explain my ideas of adding crs to the __geo_interface__.

micahcochran commented Sep 20, 2015

@jzmiller1 I think that __geo_interface__ should have an optional crs key. However, specifying the format could be a little problematic. I have some ideas on how to do it, but I feel that a post might be a little too limited. I am creating a document to explain my ideas of adding crs to the __geo_interface__.

@cleder

This comment has been minimized.

Show comment
Hide comment
@cleder

cleder May 4, 2016

https://github.com/fortyninemaps/karta also implements the __geo_interface__.

cleder commented May 4, 2016

https://github.com/fortyninemaps/karta also implements the __geo_interface__.

@bixb0012

This comment has been minimized.

Show comment
Hide comment
@bixb0012

bixb0012 Jan 9, 2017

I think it would be good to include some additional examples that clarify how tuples and lists should be handled in the output. Although there may not be a difference in terms of processing the output, there is a difference in terms of appearance, and there seems to be some debate as to which is the "better" way to go.

For example, should MULTILINESTRING((35 35, 45 45), (5 15, 15 25)) output look like

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

or

{'type': 'MultiLineString', 'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0)))}

bixb0012 commented Jan 9, 2017

I think it would be good to include some additional examples that clarify how tuples and lists should be handled in the output. Although there may not be a difference in terms of processing the output, there is a difference in terms of appearance, and there seems to be some debate as to which is the "better" way to go.

For example, should MULTILINESTRING((35 35, 45 45), (5 15, 15 25)) output look like

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

or

{'type': 'MultiLineString', 'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0)))}
@MRigal

This comment has been minimized.

Show comment
Hide comment
@MRigal

MRigal Jan 10, 2017

Any known minimal adapter of the geo_interface for psycopg2 to avoid using the Python2-constricted ppygis or the heavier ogr or shapely?

MRigal commented Jan 10, 2017

Any known minimal adapter of the geo_interface for psycopg2 to avoid using the Python2-constricted ppygis or the heavier ogr or shapely?

@micahcochran

This comment has been minimized.

Show comment
Hide comment
@micahcochran

micahcochran May 10, 2017

@bixb0012

From the spec text:

coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

This reads as if you should use tuples for a coordinates only and lists for any more complicated geometry. So the first example seems like it is correct:

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

Implementations are a whole different matter.
python-geojson seems to use lists all the way down:

>>> import geojson
In [5]: geojson.MultiLineString(coordinates=[[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])

Out[5]: {"coordinates": [[[35.0, 35.0], [45.0, 45.0]], [[5.0, 15.0], [15.0, 25.0]]], "type": "MultiLineString"}

pygeoif uses tuples all the way down.

>>> import pygeoif
In [12]: g=pygeoif.MultiLineString([[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])
In [15]: g.__geo_interface__
Out[15]: {'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0))), 'type': 'MultiLineString'}

micahcochran commented May 10, 2017

@bixb0012

From the spec text:

coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

This reads as if you should use tuples for a coordinates only and lists for any more complicated geometry. So the first example seems like it is correct:

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

Implementations are a whole different matter.
python-geojson seems to use lists all the way down:

>>> import geojson
In [5]: geojson.MultiLineString(coordinates=[[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])

Out[5]: {"coordinates": [[[35.0, 35.0], [45.0, 45.0]], [[5.0, 15.0], [15.0, 25.0]]], "type": "MultiLineString"}

pygeoif uses tuples all the way down.

>>> import pygeoif
In [12]: g=pygeoif.MultiLineString([[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])
In [15]: g.__geo_interface__
Out[15]: {'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0))), 'type': 'MultiLineString'}
@jmoujaes

This comment has been minimized.

Show comment
Hide comment
@jmoujaes

jmoujaes Sep 8, 2018

@sgillies thanks for this work!

Are coordinates purposely represented as tuples or should they be lists? There has been discussion above for both cases.
The RFC (as @aronbierbaum mentioned) says arrays, but as @aolieman mentioned above, it could be tuples.

Thanks!

jmoujaes commented Sep 8, 2018

@sgillies thanks for this work!

Are coordinates purposely represented as tuples or should they be lists? There has been discussion above for both cases.
The RFC (as @aronbierbaum mentioned) says arrays, but as @aolieman mentioned above, it could be tuples.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment