Last active
July 18, 2018 09:24
Star
You must be signed in to star a gist
Compare Column-Level vs. Row-Level Filtering in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Generate random Tour | |
#' | |
#' Generates a random Tour with nrows number of stops with x/y coordinates and | |
#' a 4-digit random ZIP-code. | |
#' | |
#' @param nrows numeric, the number of stops | |
#' | |
#' @return data.frame, the tour | |
#' @export | |
generate_random_tour <- function(nrows) { | |
x <- c(runif(nrows, min = 0, max = 100)) | |
y <- c(runif(nrows, min = 0, max = 100)) | |
loc <- c(floor(runif(nrows, 1000, 9999))) | |
return(data.frame(x = x, y = y, loc = loc, stringsAsFactors = FALSE)) | |
} | |
#' Calculate Tour distance | |
#' | |
#' Calculates the total Tour distance by choosing the column and then | |
#' the index of the stops to be adressed. | |
#' | |
#' @param tour data.frame, the tour | |
#' | |
#' @return numeric, the distance | |
#' @export | |
calc_tour_dist_optimized_filtering <- function(tour) { | |
assertthat::assert_that(nrow(tour) > 1) | |
len <- nrow(tour) | |
dist <- 0.0 | |
for (i in 2:len) { | |
dist <- dist + sqrt((tour$x[i-1]-tour$x[i])^2 + (tour$y[i-1]-tour$y[i])^2) | |
} | |
return (dist) | |
} | |
#' Calculate Tour distance | |
#' | |
#' Calculates the total Tour distance by first selecting the row followed by | |
#' the column of the stops to be adressed. | |
#' | |
#' @param tour data.frame, the tour | |
#' | |
#' @return numeric, the distance | |
#' @export | |
calc_tour_dist <- function(tour) { | |
assertthat::assert_that(nrow(tour) > 1) | |
len <- nrow(tour) | |
dist <- 0.0 | |
for (i in 2:len) { | |
dist <- dist + sqrt((tour[i-1,]$x-tour[i,]$x)^2 + (tour[i-1,]$y-tour[i,]$y)^2) | |
} | |
return (dist) | |
} | |
test <- function(nrows, nstops) { | |
for (i in 1:nrows) { | |
tour <- generate_random_tour(nstops) | |
calc_tour_dist(tour) | |
} | |
} | |
test_optimized_filtering <- function(nrows, nstops) { | |
for (i in 1:nrows) { | |
tour <- generate_random_tour(nstops) | |
calc_tour_dist_optimized_filtering(tour) | |
} | |
} | |
run_test_std_vs_optimized_filtering <- function(nrows, nstops) { | |
a <- system.time(test(nrows, nstops)) | |
cat("standard: ", a[1], "\n") | |
a <- system.time(test_optimized_filtering(nrows, nstops)) | |
cat("optimized: ", a[1]) | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the example above we demonstrate the speedup that can be attained by accessing fields in a data.frame in an optimized manner. Depending on the parameters chosen for the test a Speedup of 1 - 6 can be achieved, when using the advantageous filtering-precedence. Compare the two implementations calc_tour_dist and calc_tour_dist_optimized_filtering. They only differ in the way they're accessing the data. When we first select the column and then the row, R performs much faster, than in the base-Version, where we choose a given row and then select the attribute of interest.