michaelkirk/geo_DE-9IM.md

## geo_DE-9IM.md

      
    Raw
  

              geo_DE-9IM.md
            
          
    👖 4 geo

This is a proposal to explicitly use the Dimenstionally Extended 9 Intersection
Model (DE-9IM) semantics to succinctly,
conventionally, and accurately implement some of our geometric relations.
georust doesn't have a history of formal proposals before making changes (yet!?), but this seemed like a substantial enough endeavor that soliciting input from other folks up front seemed helpful.
Background / Motivation

Read the above wikipedia article for more detail, but tldr: this is a common
way to describes many relations between two geometries, e.g.  Disjoint,
Touches, Crosses, Within. Among other places, this method is used by JTS, which is the principal inspitation for GEOS, which is in turn used by GDAL, postgis, and many others.
DE-9IM relates two geometries based on the intersections of different cases
(inside, outside, on boundary) of the two geometries.
For example, to formally describe when A is considered to be wholly Within B, the Simple Features spec says:

There must be some intersection between A's interior with B's Interior.
There can be no intersection between A's Interior with B's Exterior
There can be no intersection between A's Boundary with B's Exterior

If, and only if, the above are true, A is said to be Within B.
Written as a matrix, the intersection cases could look like:
             │ B Interior  │ B Boundary  │ B Exterior  │
           ──┼─────────────┼─────────────┼─────────────┤
 A Interior  │ some        │ dont care   │ none        │
           ──┼─────────────┼─────────────┼─────────────┤
 A Boundary  │ dont care   │ dont care   │ none        │
           ──┼─────────────┼─────────────┼─────────────┤
 A Exterior  │ dont care   │ dont care   │ dont care   │
           ──┴─────────────┴─────────────┴─────────────┘

Often you'll see this expressed concisely as T*F**F***.
In practice, we often already refer to these specifications when implementing
relations, but we have not always been rigorous about ensuring
complete conformance - leaving some relations incomplete and some incorrect.
As well as completing and correcting some of our existing implementations,
because there are many operations defined in terms of DE-9IM, we could
relatively quickly add a series of unimplemented relations: Equals,
Disjoint, Touches, Crosses, Within, Overlaps.
Proposed Changes

The proposal is to make a module explicitly implementing the DE-9IM model available to
georust so we can implment conventional geometric relations on top of it.
In general, I'm inclined to defer heavily to the successful JTS project for
DE-9IM architecture and algorithms. JTS is the principal inspiration for GEOS,
used by gdal, postgis, and pretty much everywhere. I'm not necessarily opposed
to alternative implementations if someone has a preference for another model,
but I'm unaware of any that are remotely as successful.
In general, I think we should prioritize releasing a correct implementation
sooner, rather than gambling with a less proven approach unless the benefits
are very clear. And as always, if we can get this work done, we can always
improve later. The if is key in volunteer software projects like georust,
and should not be downplayed. =)
That said, we already have a substantially different geometry model from JTS, so it couldn't
be a direct port, and I'm inclined to pretty liberally apply Rust idioms where they help.
Getting back to technical considerations, note that though DE-9IM provides a
general and correct tool for calculating these relations, it is not always
the most efficient. For many relations, there are combinations of geometries
for which a special case optimization exists. Our general approach should be
to exaust all reasonable optimized approaches before falling back to the
generic DE-9IM Intersection Matrix approach.
A high level example of what this could look like:
// Don't get hung up on specifics please =)
struct IntesectionMatrix(blah);

impl IntesectionMatrix {
  fn is_within(&self) -> bool {
    // does self `match` "T*F**F***"
  }

  fn is_disjoin(&self) -> bool {
    // does self `match` "FF*FF****"
  }

  // ...
}

trait Relate: Intersection<Intersectable> {
  // This is the workhouse method which builds the IM from two geometries.
  //
  // I'm not at all sure that it would necessarily live in a trait quite like this,
  // but just for example sake...
  //
  // One thing I think is true is that the DE-9IM stuff will mostly be "behind the scenes"
  // e.g. I don't expect users to be interacting with GeometryGraph directly.
  fn relate(&self, other: Intersectable) -> IntersectionMatrix
}

// An example of how we could use IntersectionMatrix to implement a trait
impl Within<Polygon> for Line {
    fn within(&self, other: Polygon) -> bool {
        self.relate(other).is_within()
    }
}

impl Within<Point> for Line {
    fn within(&self, other: Point) -> bool {
        // an example optimization which avoids the need to compute the
        // relatively expensive IM
        if self.0 != self.1 {
            // unless self is zero-length, it cannot be `within` a single point
            // so we know it's false
            return false
        }

        // there is likely further/better optimization we could do here, this
        // just shows the general pattern:
        //   - prefer to handle efficient special cases
        //   - fall back to slower IM calculation if needed
        self.relate(other).is_within()
    }
}

Software Components

As a rule of thumb, it's great to deliver code iteratively. Unfortunately, the
bulk of what's needed for DE-9IM computations doesn't seem super useful on it's
own - a little akin to
https://knowyourmeme.com/photos/572078-how-to-draw-an-owl.
That said, there are probably a few things with independent value we could merge
before the rest, a few performance corners we could cut for an MVP, and then
delivering the actual traits which leverage the IntersectionMatrix stuff could
be done incrementally.
Using JTS as a reference, I've traced the calls required to compute the
InteresectionMatrix and highlighted what I think are some of the key components and methods,
and noted what could potentially be done (1.) before, (2.) as a proof of
concept/MVP, and (3.) as followup work.
From JTS, there are primarily two modules of interst:
Firstly, relate, which exposes the end users API for computing denim relations. In particular see
`RelateComputer#computeIM() for an entry point into the process.
Secondly, the geomgraph
module, which the relate module relies upon heavily. The java implementation is substantial (~5k line).
geomgraph builds a topology graph - an efficient data structure for modeling
geometries and the intersections between their parts.
Standalone

Some methods needed for DE-9IM are generally useful, and need not wait for the entire implementation.
Geometry Envelopes

computeEnvelopeInternal called by getEnvelopeInternal, which caches the result - so it's expected to be cached in some contexts.
used as a generic short-circuit to cheaply detect disjoin geometries.  Comment explains further:
   *  Returns the minimum and maximum x and y values in this <code>Geometry</code>
   *  , or a null <code>Envelope</code> if this <code>Geometry</code> is empty.
   *  Unlike <code>getEnvelopeInternal</code>, this method calculates the <code>Envelope</code>
   *  each time it is called; <code>getEnvelopeInternal</code> caches the result
   *  of this method.
   *
   *@return    this <code>Geometry</code>s bounding box; if the <code>Geometry</code>
   *      is empty, <code>Envelope#isNull</code> will return <code>true</code>
   */ 

Envelope computation seems a fairly simple and self contained task which could
be merged independently since it theoretically offers value outside of DE-9IM. I'm not sure if we'd need to incorporate envelope caching, I'd start with not, and profiling the eventual test suite to see where we end up.
Geometry Graph

geomgraph computeSelfNodes

First, each geometry is added to a GeometryGraph, and the "self" intersection nodes are computed.
Depends on EdgeSetIntersector<LineIntersector: SegmentIntersector>
SimpleMCSweepLineIntersector: EdgeSetIntersector

/**
 * Finds all intersections in one or two sets of edges,
 * using an x-axis sweepline algorithm in conjunction with Monotone Chains.
 * While still O(n^2) in the worst case, this algorithm
 * drastically improves the average-case time.
 * The use of MonotoneChains as the items in the index
 * seems to offer an improvement in performance over a sweep-line alone.
 *
 * @version 1.7
 /

We need an EdgeSetIntersector - the MCSweepLine one is a somewhat fancy
flavour.  I'd propose something like the SimpleEdgeSetIntersector: EdgeSetIntersector for the POC, and following up with something faster once we
have things working (and some test coverage).
NodeMap datastructure

Fair amount of code, but seems not too complicated and self contained. Though not really useful for anything on it's own. used by geomgraph
Next... computeEdgeIntersections

The bulk of the intersection work, delegated to the EdgeSetIntersector like in computeSelfNodes., so maybe this isn't really much new work.
Every point in every geometry is located relative to the other geometry (inside, outside, boundary)
Next... copyNodesAndLabels

Apparently something about the intersection computation can clobber the
location of a geometries own points - maybe only with respect to the boundary
determination rule, and maybe only if you're mixing matching boundary
determination rules, I'm not sure. Anyway, here we make sure to retain the
original value for self location.
If it's necessary, seems relatively simple and straight forward.
Next...  labelIsolatedNodes

Apparently the intersection only handles nodes common to both geometries, so
anything not yet touched is explicitly marked as exterior WRT the other geom.
Next... computeProperIntersectionIM

At this point we have a kind of intersection matrix. This method transforms
that output based on whether the intersections are "proper"
/** 
 * A proper intersection is an intersection which is interior to at least two
 * line segments.  Note that a proper intersection is not necessarily
 * in the interior of the entire Geometry, since another edge may have
 * an endpoint equal to the intersection, which according to SFS semantics
 * can result in the point being on the Boundary of the Geometry.
 */
public boolean hasProperIntersection() { return hasProper; }
/** 
 * A proper interior intersection is a proper intersection which is <b>not</b>
 * contained in the set of boundary nodes set for this SegmentIntersector.
 */
public boolean hasProperInteriorIntersection() { return hasProperInterior; }

And further on the dimensionality of input (e.g. point.relate(polygon) vs
line.relate(point))  like:
	if (dimA == 2 && dimB == 2) {
  if (hasProper) im.setAtLeast("212101212");
}

Next... "improper intersections"

/**
 * Now process improper intersections
 * (eg where one or other of the geometries has a vertex at the intersection point)
 * We need to compute the edge graph at all nodes to determine the IM.
 */

// build EdgeEnds for all intersections
EdgeEndBuilder eeBuilder = new EdgeEndBuilder();
List ee0 = eeBuilder.computeEdgeEnds(arg[0].getEdgeIterator());
insertEdgeEnds(ee0);
List ee1 = eeBuilder.computeEdgeEnds(arg[1].getEdgeIterator());
insertEdgeEnds(ee1);


/**
 * A EdgeEndStar is an ordered list of EdgeEnds around a node.
 * They are maintained in CCW order (starting with the positive x-axis) around the node
 * for efficient lookup and topology building.
 *
 * @version 1.7
 */
abstract public class EdgeEndStar

Finally... updateIM

/**
 * update the IM with the sum of the IMs for each component
 */
private void updateIM(IntersectionMatrix im)

Seemingly not that much work on it's own, but relies on some helper methods in IntersectionMatrix to exist, e.g. im.setAtLeastIfValid.
Other possible optimizations

As a naive approach, and maybe if we are checking multiple relations between a
pair of geometries, it will often make sense to return a fully populated
IntersectionMatrix.
However, building an IntersectionMatrix is relatively expensive, and by
recognizing that most single relations don't care about many of the cells in
the matrix, maybe it makese sense to offer an api which computes only the
values relevant to the relation we've evaluating. Consider again:
impl Within<MultiPolygon> for MultiPolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    // DE-9IM definition: "T*F**F***"
    //
    // which translates to:
    //           
    //             │ B Interior  │ B Boundary  │ B Exterior  │
    //           ──┼─────────────┼─────────────┼─────────────┤
    // A Interior  │ T           │ dont care   │ F           │
    //           ──┼─────────────┼─────────────┼─────────────┤
    // A Boundary  │ dont care   │ dont care   │ F           │
    //           ──┼─────────────┼─────────────┼─────────────┤
    // A Exterior  │ dont care   │ dont care   │ dont care   │
    //           ──┼─────────────┴─────────────┴─────────────┘

    let cell_0_0 = self.relate_case(other, (Interior, Interior))
    let cell_0_2 = self.relate_case(other, (Interior, Exterior))
    let cell_1_2 = self.relate_case(other, (Boundary, Exterior))

    return cell_0_0.is_t() && cell_0_2.is_f() && cell_1_2().is_f()
  }
}

Or imagine for a moment, a more one-shot API like this:
impl Within<MultiPolygon> for MultiPolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    // DE-9IM definition: "T*F**F***"
    self.relate_cases(other, vec![
        (Interior, Interior, Dim::T)
        (Interior, Exterior, Dim::F)
        (Boundary, Exterior, Dim::F)
    ])
  }
}

From there, it's easy to envision building up a call dynamically by parsing an
actual string:
impl Within<Multipolygon> for Multipolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    self.relate_string(other, "T*F**F***");
  }
}

And of course, the natural conclusion for ultimate style, a compile time macro:
impl Within<Multipolygon> for Multipolygon {
  fn within(&self, other: MultiPolygon) -> bool {
    relate_denim!(self, other, "T*F**F***")
  }
}

At first, this seemed to me like a nice API to implement, and it may be for
ergonomics of writing,  but as a performance optimization it seems less
promising. The geomgraph doesn't work like this. It's not doing 9 independent
computations that we can easily short-circuit. To compute even one of the
nine cells, we'd still need to feed the geometries into a topology graph,
which, as far as I can tell, represents the bulk of the work. I bring this up
because it was previously mentioned. I'm happy to entertain other proposals if
specifics are provided.
Priorities

The task list!
0. independent pre-work


 geometery envelopes

1. POC ("draw the rest of the owl")


 Simple EdgeSetIntersector and friends (LineIntersector/EdgeIntersector)
 GeomGraph - this is big!
 Building the IntersectionMatrix

 amassing/porting a test suite at this layer seems especially useful


 IntersectionMatrix convenience methods

 is_contains, etc.
 maybe ergonomic macro-ish invocation, i.e. relate!(a, b, "TFF")


2. Followup


 fix existing broken traits (Contains)
 faster EdgeSetIntersector
 exhaustively implementing relationshipt traits for all geometries, incorporating plausible optimizations per geometry

 Within
 Disjoint
 etc. from OS Geo Simple Feature Spec


As currently phrased, there is a little room for code-writing collaboration if other people are interested in contributing to this, but there are also some pretty broad swaths of serialized work that I haven't tried to decompose/decouple, especially until the POC is done. After that point it gets a littel better. If someone has compelling ideas on how to further decompose into useful parts, I'm listening!