Skip to content

Instantly share code, notes, and snippets.

@michaelkirk
Last active October 5, 2020 16:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save michaelkirk/3ac4afe651fe24ad229dec0b56ec8f9d to your computer and use it in GitHub Desktop.
Save michaelkirk/3ac4afe651fe24ad229dec0b56ec8f9d to your computer and use it in GitHub Desktop.

πŸ‘– 4 geo

This is a proposal to explicitly use the Dimenstionally Extended 9 Intersection Model (DE-9IM) semantics to succinctly, conventionally, and accurately implement some of our geometric relations.

georust doesn't have a history of formal proposals before making changes (yet!?), but this seemed like a substantial enough endeavor that soliciting input from other folks up front seemed helpful.

Background / Motivation

Read the above wikipedia article for more detail, but tldr: this is a common way to describes many relations between two geometries, e.g. Disjoint, Touches, Crosses, Within. Among other places, this method is used by JTS, which is the principal inspitation for GEOS, which is in turn used by GDAL, postgis, and many others.

DE-9IM relates two geometries based on the intersections of different cases (inside, outside, on boundary) of the two geometries.

For example, to formally describe when A is considered to be wholly Within B, the Simple Features spec says:

  • There must be some intersection between A's interior with B's Interior.
  • There can be no intersection between A's Interior with B's Exterior
  • There can be no intersection between A's Boundary with B's Exterior

If, and only if, the above are true, A is said to be Within B.

Written as a matrix, the intersection cases could look like:

             β”‚ B Interior  β”‚ B Boundary  β”‚ B Exterior  β”‚
           ──┼─────────────┼─────────────┼──────────────
 A Interior  β”‚ some        β”‚ dont care   β”‚ none        β”‚
           ──┼─────────────┼─────────────┼──────────────
 A Boundary  β”‚ dont care   β”‚ dont care   β”‚ none        β”‚
           ──┼─────────────┼─────────────┼──────────────
 A Exterior  β”‚ dont care   β”‚ dont care   β”‚ dont care   β”‚
           β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Often you'll see this expressed concisely as T*F**F***.

In practice, we often already refer to these specifications when implementing relations, but we have not always been rigorous about ensuring complete conformance - leaving some relations incomplete and some incorrect.

As well as completing and correcting some of our existing implementations, because there are many operations defined in terms of DE-9IM, we could relatively quickly add a series of unimplemented relations: Equals, Disjoint, Touches, Crosses, Within, Overlaps.

Proposed Changes

The proposal is to make a module explicitly implementing the DE-9IM model available to georust so we can implment conventional geometric relations on top of it.

In general, I'm inclined to defer heavily to the successful JTS project for DE-9IM architecture and algorithms. JTS is the principal inspiration for GEOS, used by gdal, postgis, and pretty much everywhere. I'm not necessarily opposed to alternative implementations if someone has a preference for another model, but I'm unaware of any that are remotely as successful.

In general, I think we should prioritize releasing a correct implementation sooner, rather than gambling with a less proven approach unless the benefits are very clear. And as always, if we can get this work done, we can always improve later. The if is key in volunteer software projects like georust, and should not be downplayed. =)

That said, we already have a substantially different geometry model from JTS, so it couldn't be a direct port, and I'm inclined to pretty liberally apply Rust idioms where they help.

Getting back to technical considerations, note that though DE-9IM provides a general and correct tool for calculating these relations, it is not always the most efficient. For many relations, there are combinations of geometries for which a special case optimization exists. Our general approach should be to exaust all reasonable optimized approaches before falling back to the generic DE-9IM Intersection Matrix approach.

A high level example of what this could look like:

// Don't get hung up on specifics please =)
struct IntesectionMatrix(blah);

impl IntesectionMatrix {
  fn is_within(&self) -> bool {
    // does self `match` "T*F**F***"
  }

  fn is_disjoin(&self) -> bool {
    // does self `match` "FF*FF****"
  }

  // ...
}

trait Relate: Intersection<Intersectable> {
  // This is the workhouse method which builds the IM from two geometries.
  //
  // I'm not at all sure that it would necessarily live in a trait quite like this,
  // but just for example sake...
  //
  // One thing I think is true is that the DE-9IM stuff will mostly be "behind the scenes"
  // e.g. I don't expect users to be interacting with GeometryGraph directly.
  fn relate(&self, other: Intersectable) -> IntersectionMatrix
}

// An example of how we could use IntersectionMatrix to implement a trait
impl Within<Polygon> for Line {
    fn within(&self, other: Polygon) -> bool {
        self.relate(other).is_within()
    }
}

impl Within<Point> for Line {
    fn within(&self, other: Point) -> bool {
        // an example optimization which avoids the need to compute the
        // relatively expensive IM
        if self.0 != self.1 {
            // unless self is zero-length, it cannot be `within` a single point
            // so we know it's false
            return false
        }

        // there is likely further/better optimization we could do here, this
        // just shows the general pattern:
        //   - prefer to handle efficient special cases
        //   - fall back to slower IM calculation if needed
        self.relate(other).is_within()
    }
}

Software Components

As a rule of thumb, it's great to deliver code iteratively. Unfortunately, the bulk of what's needed for DE-9IM computations doesn't seem super useful on it's own - a little akin to https://knowyourmeme.com/photos/572078-how-to-draw-an-owl.

That said, there are probably a few things with independent value we could merge before the rest, a few performance corners we could cut for an MVP, and then delivering the actual traits which leverage the IntersectionMatrix stuff could be done incrementally.

Using JTS as a reference, I've traced the calls required to compute the InteresectionMatrix and highlighted what I think are some of the key components and methods, and noted what could potentially be done (1.) before, (2.) as a proof of concept/MVP, and (3.) as followup work.

From JTS, there are primarily two modules of interst: Firstly, relate, which exposes the end users API for computing denim relations. In particular see `RelateComputer#computeIM() for an entry point into the process.

Secondly, the geomgraph module, which the relate module relies upon heavily. The java implementation is substantial (~5k line). geomgraph builds a topology graph - an efficient data structure for modeling geometries and the intersections between their parts.

Standalone

Some methods needed for DE-9IM are generally useful, and need not wait for the entire implementation.

Geometry Envelopes

computeEnvelopeInternal called by getEnvelopeInternal, which caches the result - so it's expected to be cached in some contexts.

used as a generic short-circuit to cheaply detect disjoin geometries. Comment explains further:

   *  Returns the minimum and maximum x and y values in this <code>Geometry</code>
   *  , or a null <code>Envelope</code> if this <code>Geometry</code> is empty.
   *  Unlike <code>getEnvelopeInternal</code>, this method calculates the <code>Envelope</code>
   *  each time it is called; <code>getEnvelopeInternal</code> caches the result
   *  of this method.
   *
   *@return    this <code>Geometry</code>s bounding box; if the <code>Geometry</code>
   *      is empty, <code>Envelope#isNull</code> will return <code>true</code>
   */ 

Envelope computation seems a fairly simple and self contained task which could be merged independently since it theoretically offers value outside of DE-9IM. I'm not sure if we'd need to incorporate envelope caching, I'd start with not, and profiling the eventual test suite to see where we end up.

Geometry Graph

First, each geometry is added to a GeometryGraph, and the "self" intersection nodes are computed.

Depends on EdgeSetIntersector<LineIntersector: SegmentIntersector>

/**
 * Finds all intersections in one or two sets of edges,
 * using an x-axis sweepline algorithm in conjunction with Monotone Chains.
 * While still O(n^2) in the worst case, this algorithm
 * drastically improves the average-case time.
 * The use of MonotoneChains as the items in the index
 * seems to offer an improvement in performance over a sweep-line alone.
 *
 * @version 1.7
 /

We need an EdgeSetIntersector - the MCSweepLine one is a somewhat fancy flavour. I'd propose something like the SimpleEdgeSetIntersector: EdgeSetIntersector for the POC, and following up with something faster once we have things working (and some test coverage).

NodeMap datastructure

Fair amount of code, but seems not too complicated and self contained. Though not really useful for anything on it's own. used by geomgraph

The bulk of the intersection work, delegated to the EdgeSetIntersector like in computeSelfNodes., so maybe this isn't really much new work.

Every point in every geometry is located relative to the other geometry (inside, outside, boundary)

Apparently something about the intersection computation can clobber the location of a geometries own points - maybe only with respect to the boundary determination rule, and maybe only if you're mixing matching boundary determination rules, I'm not sure. Anyway, here we make sure to retain the original value for self location.

If it's necessary, seems relatively simple and straight forward.

Apparently the intersection only handles nodes common to both geometries, so anything not yet touched is explicitly marked as exterior WRT the other geom.

At this point we have a kind of intersection matrix. This method transforms that output based on whether the intersections are "proper"

/** 
 * A proper intersection is an intersection which is interior to at least two
 * line segments.  Note that a proper intersection is not necessarily
 * in the interior of the entire Geometry, since another edge may have
 * an endpoint equal to the intersection, which according to SFS semantics
 * can result in the point being on the Boundary of the Geometry.
 */
public boolean hasProperIntersection() { return hasProper; }
/** 
 * A proper interior intersection is a proper intersection which is <b>not</b>
 * contained in the set of boundary nodes set for this SegmentIntersector.
 */
public boolean hasProperInteriorIntersection() { return hasProperInterior; }

And further on the dimensionality of input (e.g. point.relate(polygon) vs line.relate(point)) like:

	if (dimA == 2 && dimB == 2) {
  if (hasProper) im.setAtLeast("212101212");
}
/**
 * Now process improper intersections
 * (eg where one or other of the geometries has a vertex at the intersection point)
 * We need to compute the edge graph at all nodes to determine the IM.
 */

// build EdgeEnds for all intersections
EdgeEndBuilder eeBuilder = new EdgeEndBuilder();
List ee0 = eeBuilder.computeEdgeEnds(arg[0].getEdgeIterator());
insertEdgeEnds(ee0);
List ee1 = eeBuilder.computeEdgeEnds(arg[1].getEdgeIterator());
insertEdgeEnds(ee1);


/**
 * A EdgeEndStar is an ordered list of EdgeEnds around a node.
 * They are maintained in CCW order (starting with the positive x-axis) around the node
 * for efficient lookup and topology building.
 *
 * @version 1.7
 */
abstract public class EdgeEndStar

Finally... updateIM

/**
 * update the IM with the sum of the IMs for each component
 */
private void updateIM(IntersectionMatrix im)

Seemingly not that much work on it's own, but relies on some helper methods in IntersectionMatrix to exist, e.g. im.setAtLeastIfValid.

Other possible optimizations

As a naive approach, and maybe if we are checking multiple relations between a pair of geometries, it will often make sense to return a fully populated IntersectionMatrix.

However, building an IntersectionMatrix is relatively expensive, and by recognizing that most single relations don't care about many of the cells in the matrix, maybe it makese sense to offer an api which computes only the values relevant to the relation we've evaluating. Consider again:

impl Within<MultiPolygon> for MultiPolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    // DE-9IM definition: "T*F**F***"
    //
    // which translates to:
    //           
    //             β”‚ B Interior  β”‚ B Boundary  β”‚ B Exterior  β”‚
    //           ──┼─────────────┼─────────────┼──────────────
    // A Interior  β”‚ T           β”‚ dont care   β”‚ F           β”‚
    //           ──┼─────────────┼─────────────┼──────────────
    // A Boundary  β”‚ dont care   β”‚ dont care   β”‚ F           β”‚
    //           ──┼─────────────┼─────────────┼──────────────
    // A Exterior  β”‚ dont care   β”‚ dont care   β”‚ dont care   β”‚
    //           β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    let cell_0_0 = self.relate_case(other, (Interior, Interior))
    let cell_0_2 = self.relate_case(other, (Interior, Exterior))
    let cell_1_2 = self.relate_case(other, (Boundary, Exterior))

    return cell_0_0.is_t() && cell_0_2.is_f() && cell_1_2().is_f()
  }
}

Or imagine for a moment, a more one-shot API like this:

impl Within<MultiPolygon> for MultiPolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    // DE-9IM definition: "T*F**F***"
    self.relate_cases(other, vec![
        (Interior, Interior, Dim::T)
        (Interior, Exterior, Dim::F)
        (Boundary, Exterior, Dim::F)
    ])
  }
}

From there, it's easy to envision building up a call dynamically by parsing an actual string:

impl Within<Multipolygon> for Multipolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    self.relate_string(other, "T*F**F***");
  }
}

And of course, the natural conclusion for ultimate style, a compile time macro:

impl Within<Multipolygon> for Multipolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    relate_denim!(self, other, "T*F**F***")
  }
}

At first, this seemed to me like a nice API to implement, and it may be for ergonomics of writing, but as a performance optimization it seems less promising. The geomgraph doesn't work like this. It's not doing 9 independent computations that we can easily short-circuit. To compute even one of the nine cells, we'd still need to feed the geometries into a topology graph, which, as far as I can tell, represents the bulk of the work. I bring this up because it was previously mentioned. I'm happy to entertain other proposals if specifics are provided.

Priorities

The task list!

0. independent pre-work

  • geometery envelopes

1. POC ("draw the rest of the owl")

  • Simple EdgeSetIntersector and friends (LineIntersector/EdgeIntersector)
  • GeomGraph - this is big!
  • Building the IntersectionMatrix
    • amassing/porting a test suite at this layer seems especially useful
  • IntersectionMatrix convenience methods
    • is_contains, etc.
    • maybe ergonomic macro-ish invocation, i.e. relate!(a, b, "TFF")

2. Followup

  • fix existing broken traits (Contains)
  • faster EdgeSetIntersector
  • exhaustively implementing relationshipt traits for all geometries, incorporating plausible optimizations per geometry
    • Within
    • Disjoint
    • etc. from OS Geo Simple Feature Spec

As currently phrased, there is a little room for code-writing collaboration if other people are interested in contributing to this, but there are also some pretty broad swaths of serialized work that I haven't tried to decompose/decouple, especially until the POC is done. After that point it gets a littel better. If someone has compelling ideas on how to further decompose into useful parts, I'm listening!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment