Skip to content

Instantly share code, notes, and snippets.

@rafapereirabr
Last active April 24, 2016 18:27
Show Gist options
  • Save rafapereirabr/b75a30402f9e92dd3b86f27f692d3e74 to your computer and use it in GitHub Desktop.
Save rafapereirabr/b75a30402f9e92dd3b86f27f692d3e74 to your computer and use it in GitHub Desktop.
How to efficiently calculate distance between pair of coordinates using data.table :=

This is the fastest way I've found so far to calculate the distances between pairs of lat long coordinates. The method takes advantage of distGeo{geosphere} and the crazy fast := operator of data.table.

Load libraries and data for Reproducible example

# load libraries
  library(data.table)
  library(dplyr)
  library(sp)
  library(rgeos)
  library(UScensus2000tract)
  library(geosphere)

# load data and create an Origin-Destination matrix
  data("oregon.tract")

# get centroids as a data.frame
  centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))

# Convert row names into first column
  setDT(centroids, keep.rownames = TRUE)[]

# create Origin-destination matrix
  orig <- centroids[1:754, ]
  dest <- centroids[2:755, ]
  odmatrix <- bind_cols(orig,dest)
  colnames(odmatrix) <- c("origi_id", "long_orig", "lat_orig", "dest_id", "long_dest", "lat_dest")

Let's make the data set larger to compare speed performance.

# Multiplicate data observations by 1000 
  odmatrix <- odmatrix[rep(seq_len(nrow(odmatrix)), 1000), ]

###Slow solution

system.time(
           odmatrix$dist_km <- sapply(1:nrow(odmatrix),function(i)
             spDistsN1(as.matrix(odmatrix[i,2:3]),as.matrix(odmatrix[i,5:6]),longlat=T)) 
            )

 >   user  system elapsed 
 >   222.17    0.08  222.84 

###Fast solution


# convert the data.frame to a data.table
  setDT(odmatrix)

system.time(
            odmatrix[ , dist_km2 := distGeo(matrix(c(long_orig, lat_orig), ncol = 2), 
                                            matrix(c(long_dest, lat_dest), ncol = 2))/1000]
           )

>   user  system elapsed 
>   1.76    0.03    1.79 

As you can see, the faster solution was more than 120 times faster than the slower alternative in my laptop.

Warning: The results are not identical, but they are roughly the same (a few centimeters-difference) for this specific example. For longer distances, though, the difference might get a bit bigger. This is because spDistsN1{sp} uses the Euclidean or Great Circle distance between points, while distGeo{geosphere} calculates distance on an ellipsoid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment