Skip to content

Instantly share code, notes, and snippets.

@kgjenkins
Last active May 26, 2018 16:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgjenkins/089fa8f4cb8d5fbea5037351f1d44fa7 to your computer and use it in GitHub Desktop.
Save kgjenkins/089fa8f4cb8d5fbea5037351f1d44fa7 to your computer and use it in GitHub Desktop.
Faster, faster distance matrix

Faster, faster distance matrix

Suppose you have a set of points, and would like to calculate the distance from each point to every other point. QGIS provides a "Distance Matrix" tool that can do this. It can actually produce several types of distance matrices, but for now this post will focus on what the tool calls a "Linear (N*k x 3) distance matrix", which gives us an output that looks like this:

InputID TargetID Distance
1 1 0
1 2 344
1 3 398
2 1 344
2 2 0
2 3 572
3 1 398
3 2 572
3 3 0

When working with a large number of points, it can take a while to run, so I started to look at other ways to calculate this same matrix in QGIS.

After some initial experimentation, I decided to use the DB Manager in QGIS 2.99 to test the following query on a set of 300 randomly-generated points saved in different data storage formats:

select
 a.id as id1,
 b.id as id2,
 st_distance(a.geometry, b.geometry, 1) as dist
from
 points as a,
 points as b
where a.id <> b.id

This query outputs a table of 90,000 rows. Since DB Manager doesn't always report the response time, I timed each run with a stopwatch. So the times are approximate, but close enough for a fair comparison. Here are the results (all times in seconds):

Storage format Distance Matrix Virtual Layer SQL native SQL
shapefile 11 10 x
spatialite 63 10 3
geopackage 30 10 xx
postgis 18 11 0.6

x = shapefiles have no native SQL access
xx = I kept getting NULL distance values when running the query on the geopackage (not sure why)

There are several interesting things to notice here. First, it was rather suprising that the Distance Matrix tool works much faster on shapefiles, which were nearly 6x faster than spatialite files. My guess is that this built-in tool was designed with shapefiles in mind, and does not take advantage of some of the benefits of the other formats.

It's also interesting that QGIS virtual layers basically runs all the SQL queries at the same speed regardless of the format. I'm not entirely certain, but I suspect that this is because virtual layers uses spatialite behind the scenes (or so I heard). So once the data is loaded into a temporary spatialite database (or maybe linked as a virtual table?), it's the same spatialite code doing the same work.

Finally, the main lesson from this is that it is much, much faster to run an SQL query directly on a spatialite or postgis table, and postgis wins by a lot. (Sorry, geopackage, but I just couldn't find the syntax to make you happy.)

Also, if you are already working with data in postgis or spatialite databases, you might want to try writing SQL queries instead of relying on some of the built-in processing tools that might not be optimized for database formats.

@barryrowlingson
Copy link

st_distance(a.geometry, b.geometry,1) works for me in QGIS with a geopackage if its in EPSG:4326 - otherwise if I leave out the last parameter st_distance(a.geometry, b.geometry) it works for any projection...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment