See https://www.geographyrealm.com/terrestrial-ecoregions-gis-data/ for background on the Ecoregions data and a link to the shapefile. The version discussed here was downloaded November 2024.
This document describes a few issues related to importing that shapefile into TaxonWorks. The tldr;: if you import the original file, record 206 won't import and record 748 will import with an issue - the dropbox link in the next paragraph provides a version of Ecoregions2017 that imports into TaxonWorks without those issues, you should import that and not the original copy; read on if you've already imported the original copy.
If you're just looking for the new versions of the shapefile adjusted for import into TaxonWorks, they're at https://www.dropbox.com/scl/fo/36i8h58f60prip0tg2awr/ADQdPEgCc4XhruwgyRKaXS0?rlkey=kv0m4ujsx1q5bjqezx4a3r2x7&st=o2hr0nee&dl=0 - note you don't need a dropbox account to download these. If you haven't imported the original Ecoregions2017 shapefile into TaxonWorks yet then you should download the Ecoregsions2017_for_TW.zip from the dropbox link, unzip it, and import that shapefile into TW to get all of the Ecoregions shapes. If you've already imported the original Ecoregions2017.shp into TaxonWorks then you'll just want to delete the 'Transantarctic Mountains tundra' gazetteer, and then import the 207_753_for_TW shapefile from the dropbox link - that will add two new Gazetteers to your original import, a new 'Transantarctic Mountains tundra' and 'Rock and ice'. If you have any issues or questions please let me know! The rest of this is more technical and not really related to user use of the TW-adjusted shapefile (unless you're collecting below -85.0511287 latitude!).
Now back to the original Ecoregions2017 shapefile: Ecoregions2017.zip is 143M, Ecoregions.shp is 232M, the shapefile contains 847 polygon records:
$ ogrinfo -al -geom=NO -summary Ecoregions2017.shp
INFO: Open of `Ecoregions2017.shp'
using driver `ESRI Shapefile' successful.
Layer name: Ecoregions2017
Geometry: Polygon
Feature Count: 847
Extent: (-179.999989, -89.891973) - (180.000000, 83.623125)
Layer SRS WKT:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
OBJECTID: Real (32.10)
ECO_NAME: String (150.0)
BIOME_NUM: Real (32.10)
BIOME_NAME: String (254.0)
REALM: String (254.0)
ECO_BIOME_: String (254.0)
NNH: Integer64 (11.0)
ECO_ID: Integer64 (11.0)
SHAPE_LENG: Real (32.10)
SHAPE_AREA: Real (32.10)
NNH_NAME: String (64.0)
COLOR: String (7.0)
COLOR_BIO: String (7.0)
COLOR_NNH: String (7.0)
LICENSE: String (64.0)
ogrinfo Ecoregions2017.shp -dialect SQLite -sql "SELECT count(*) FROM Ecoregions2017 WHERE IsValid(geometry) = 0"
returns 69, all of the form GEOS warning: Ring Self-intersection at or near point 116.52615746989142 6.1101127119322456
, i.e. 69 of the shapefile records are geometrically invalid because they have a self-intersection.
If you use the TaxonWorks Import Gazetteers task to import Ecoregions2017.shp, the import will complete successfully but there are two issues. One is a record that failed to import, which is reported in the UI. The other is a record that imported unexpectedly as a GeometryCollection, which (at least currently) you wouldn't know unless you view the GI wkt for that item.
- The error reported in the UI is for record 206 (ECOID 207):
'TopologyException: side location conflict at -179.99828579499996 -85.051128700000007 0. This can occur if the input geometry is invalid.'
(this is the record named 'Rock and Ice') - Record 748 (ECOID 753), Transantarctic Mountains tundra, imports as a GeometryCollection containing a MultiPolygon and a MultiLinestring (the expected shape is MultiPolygon)
- Neither of those shapes was one of the 69 reported as invalid! The ones that were invalid due to self-intersection were auto-corrected to be made valid on import, though that process doesn't always do what one might hope/expect - there could still be some unexpected shape errors in the successfully imported shapes.
Side note: shape 206 is 37M as wkt, it has 1,273,688 vertices(!).
Everything below here includes some supposition based on the behavior I'm seeing - it's not based on looking at the RGeo source code and could be wrong. Corrections appreciated!
The reason for both issues boils down to the way that TaxonWorks (or rather RGeo, given TaxonWorks' RGeo settings) works with latitudes, namely by clamping all latitudes to the range [-85.0511287, 85.0511287]
(I haven't checked RGeo code to be sure, but this seems to correspond to the square limits of the pseudo-mercator projection).
In the 206: Rock and Ice case, the issue is that the clamping creates a spur/dangling edge: in the image below note the small rectangle jutting out to the left; that rectangle is below the -85.0511287 limit so gets squashed into a horizontal line all of whose latitudes are -85.0511287
When RGeo::shapefile tries to load that record to return to us, the spur causes an exception before we ever receive the shape (apparently spurs are treated as more serious than self-intersections, since self-intersections are tolerated at that point (as long as the RGeo::Shapefile allow_unsafe
flag is on, which is supposed to allow reading invalid shapes)).
In the 748: Transantarctic Mountains tundra case the issue is that an interior ring crosses the -85.0511287 threshold, part above, part below. The clamped shape becomes invalid and when make_valid is run the process turns part(s) of the interior ring that got squished up to -85.0511287 into a multiline (I think the interior polygon that got its lower part clamped into a line becomes an open cavity and the line across the opening of that cavity becomes a separate geometry... something along those lines, I didn't go far enough to completely replicate with simpler test shapes).
As mentioned earlier, TW is able, through RGeo, to "fix" shapes that have a self-intersection, so it's not necessary to load a file with those fixed, but I think it feels better to do so. I'll use ogr2ogr
:
ogr2ogr -makevalid Ecoregions2017_for_TW.shp Ecoregions2017.shp
(ogr2ogr
, according to its man page, also removes any lower dimensional geometries created by the make-valid process, which TW does not yet (that requires a higher version of GEOS).)
If you import that new shapefile into TW, you still find the same two issues as with the original file - that's because both of those issues are due to latitude clamping, which only occurs once shapes are read by RGeo.
This was the shape that developed a dangling edge after clamping; one fix that works (iirc) is simply to remove that sub-polygon that jutted out and became the dangling edge. The more general fix I think is to clip the original shape to between latitudes 85.0511287 and -85.0511287. The longitude range of the clip rectangle would in theory be -180 to 180 but ... I don't trust that to work, so I'm using 179.99999999 instead. [Gory detail: ST_Intersect concludes intersection if the shapes are within 0.00001 meters of each other, so not using 180 exactly is "a good thing", and in fact the value 179.99999999 should be chosen to have the property that, in WGS84, at latitude 85.0511287, a point at longitude 179.99999999 is more than 0.00001 meters from the anti-meridian - in this case it's more like 0.095mm > 0.01mm.] I'm not a wiz with QGis, so I wrote the rectangle as a geojson file and then imported that into QGis and clipped shape 206 to that rectangle. If you export that clipped shape to a shapefile and try to import it into TW, it imports with no errors, but the shape is all wrong:
Before moving on, note that at the bottom of the image you can tell where the clip at latitude -85.0511287 occurred.
The more notable feature though is that most of the shape seems "inside out" (it should essentially be antarctica). That turns out to be because TW thinks the imported shape crosses the anti-meridian and so shifts its coordinates from [-180, 180]
to [0, 360]
, which in RGeo terms, means edges between points in opposite east/west hemispheres are joined by a line crossing the anti-meridian, not the meridian (which would correspond to the 0/360 edge which coordinates can't cross). If you zoom into where that shape crosses the meridian:
in the middle of the image there's a gap between a lower horizontal line and a higher horizontal line - that gap is where the unshifted version of the shape has an edge that crosses the meridian - in the shifted shape the edge between those two vertices crosses the anti-meridian, i.e. wraps all the way around the globe: what looked like horizontal lines are actually slightly sloped going from the lower vertex to the upper around the globe. Adding that one changed line flips inside and out for other parts of the polygon and results in the picture above.
Why does TW think the original imported clipped shape crosses the anti-meridian? When we clipped 206 to the pseudo-mercator box, new cutoff horizontal edges at latitude -85.0511287 were created to close in the new clipped shape, in particular part of that edge became [ 162.96855934317523, -85.0511287 ], [ -90.028383589490403, -85.0511287 ]
. Note that the latitude span of that edge is ~252, which is greater than 180. ST_Intersect
interprets edges between vertices as being shortest length geodesics, so in the mind of ST_Intersect
, that edge does in fact cross the anti-meridian! There's no way to internally represent an edge of longitude greater than 180 degrees in RGeo with regard to ST_Intersect (other postgis functions have different interpretations, as we discussed above!), so the fix here is to manually add a vertex to that edge so that it becomes two edges each of longitude span < 180. Naturally I chose longitude 0 to split that edge. Long story short, that exact one particular choice of longitude 0.0 can also get you in trouble in this scenario (see TW's anti_meridian_spec.rb for a spec noting the issue - I don't know if it's a postgis bug (which right now it seems like to me), or if there's some logical explanation, so I won't say more here). So choose any other longitude that satisfies the same criterion, I chose 10.
[More gory details: when I tried to run the edit shape tool in QGis my system ran out of memory loading the vertex table (that's the one with ~1.2 million vertices), so I exported the shape to geojson, edited the edge discussed above by hand, re-imported that to QGis and then exported it as a shapefile - that gives it the same field datatypes as what I'll get for 748 below, if I just use ogr2ogr to convert geojson to shp directly then I get different datatypes for some of the fields compared to exporting a shapefile from QGis.]
That shapefile imports correctly, as far as I can tell. Whew.
This one is simpler - the original issue was that the clamped shape developed a line, which turned the imported shape into a GeometryCollection consisting of a MultiPolygon and a MultiLine. Here if we just clip 748 to the pseudo-mercator rectangle, export it to a shapefile, and import it into TW, all is (appears to be) well, it now imports as a MultiPolygon.
- TW's RGeo settings clamp latitudes to the range
[-85.0511287, 85.0511287]
- if you're importing a shape that draws outside that range you may well have issues. In the Ecoregions case the only two shapes that did so both had issues. - TW's
crosses_anti_meridian?
is a fundamental tool for determining in what way the coordinates of a shape are to be imported and drawn in TW.crosses_anti_meridian?
in turn just forwards to postgis'sST_Intersects
. That function, afaict, takes the vertices of a shape and interprets an edge between two vertices as being the shortest-distance geodesic between them. That means you cannot import a shape that has an edge that is meant to go "the long way" around the globe, you'll get the wrong result. - One we already knew: just because the import succeeds without an error doesn't mean there aren't any issues - the process of making an invalid shape valid doesn't always do what you'd like. We automatically make invalid shapes valid internally (for cases like above where the original file would have had 69 failing shapes otherwise), so right now the only way to tell would be to examine each imported shape. (Two possible improvements for TW: 1) note in the UI which shapes had to be made valid; 2) all shapes imported for a polygon shapefile, e.g., should be Polygons or MultiPolygons - if that's not the case we could add a warning for that shape.)
I assume that with the new Ecoregion shapefile gazetteer, one could filter COs and CEs on a particular Ecoregion (Biome or "biogeographic realm"). However, would there be a way to create a predicate field "Ecoregion," "Biome," and "biogeographic realms" and have the data filled in based on the gazetter shapefiles so the names could be exported and used in other analyses?