Rich-Harris/geojson.md

## geojson.md

      
    Raw
  

              geojson.md
            
          
    A better GeoJSON

GeoJSON is a widely-used format for encoding geographic data. It's flexible and human-readable, and because it's just JSON it's easy to integrate into web applications.
But it has some real warts, and if we wanted to we could certainly come up with a better format. After tweeting about my frustrations, I was asked to elaborate. Here goes:
Redundancy

GeoJSON geometries can be one of seven types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon and GeometryCollection.
I've never seen a GeometryCollection in the wild, but let's be generous and assume they do exist. That leaves six, three of which are completely unnecessary: Point, LineString and Polygon.
They're unnecessary because these two things are functionally equivalent:
{
  type: 'Polygon',
  coordinates: [ outerRing, hole1, hole2, ... ]
}

{
  type: 'MultiPolygon',
  coordinates: [[ outerRing, hole1, hole2, ... ]]
}
The Polygon version is a few bytes shorter (a difference that in real-world applications will evaporate due to gzip), but apart from that it's just a special case of a MultiPolygon.
But because that special case exists, code like this exists in just about every application that works with GeoJSON directly:
if ( geometry.type === 'Polygon' ) {
  renderPolygon( geometry.coordinates );
} else {
  geometry.coordinates.forEach( renderPolygon );
}
Something like that – along with the constant mental context switching that goes along with it (wait, at this point in the code am I dealing with a coordinate pair? a ring? something else?) – has to exist every time you touch GeoJSON. The cost of that special case is astronomical in proportion to its benefit. The same goes for LineString and Point.
Right now I'm working with a clipping library (I won't name and shame) that can return either Polygon coordinates or MultiPolygon coordinates. And it doesn't tell you which! You have to figure out by yourself whether the second and third items are separate polygons, or holes in the first one. That sort of confusion is deeply harmful to productivity, and totally unnecessary.
Another example of redundancy is the fact that a Polygon ring must end with a coordinate pair that matches the first one. Why? In many applications your code for handling polygons will share functions with your code for handling line strings, and I've had to write code like this more times than I can count:
const end = /Polygon/.test( type ) : line.length - 1 : line.length;

for ( let i = 0; i < end; i += 1 ) {
  doSomethingWith( line[i] );
}
Of course there are some cases where it is easier to iterate over an array of coordinate pairs that ends where it started – but in my experience it's almost always easier to adapt code that expects a closed ring than code that expects a non-closed ring. Bonus: the file gets smaller.
Performance

Every single point in a GeoJSON file gets its own array. That's terrible for performance, because allocating arrays isn't free, and garbage collecting them is liable to cause jank. Performant code relies on flat structures.
Instead of this...
[ [ x0, y0 ], [ x1, y1 ], [ x2, y2 ], ... ]
...we could do this:
[ x0, y0, x1, y1, x2, y2, ... ]
If you ever need to write any WebGL code, or find yourself triangulating your geometry, you'll quickly find that this is a more convenient way of working.
Perhaps you're thinking that it'll make things harder, because instead of doing this...
ring.forEach( coords => {
  ctx.lineTo( coords[0], coords[1] );
});
...you'd have to do this...
for ( let i = 0; i < ring.length; i += 2 ) {
  ctx.lineTo( coords[i], coords[i+1] );
}
...but that's a good thing, because the second example will be much faster. The right data structure encourages the right programming habits.
As a bonus, it's very easy to convert those flat arrays to typed arrays, which have excellent performance characteristics (because the browser is able to make stronger guarantees about their behaviour). You can also do really cool things like instantly transferring the data to a web worker to do expensive computation off the main thread, without the cost of serialization/deserialization.
One thing to note: if you have a flat array, you can't detect the dimensionality of the data by querying the first point. But that's also a good thing – it forces you to be explicit.
Properties

Each GeoJSON feature can have arbitrary properties attached to it. That's useful in many situations, but I've never once actually used it in an app because typically that data lives somewhere else so that it can be accessed by other parts of my app. All I want in my GeoJSON is geometry – the object's id field is enough. But if you don't include an empty properties: {} object, it's not valid GeoJSON. We don't need it.

Is there a realistic possibility that we could displace GeoJSON with a superior (but still human-readable) format? I don't know. But if anyone is interested in making it happen then let me know – maybe we can do something.