Skip to content

Instantly share code, notes, and snippets.

@phabee
Last active Jul 18, 2018
Embed
What would you like to do?
Compare Column-Level vs. Row-Level Filtering in R
#' Generate random Tour
#'
#' Generates a random Tour with nrows number of stops with x/y coordinates and
#' a 4-digit random ZIP-code.
#'
#' @param nrows numeric, the number of stops
#'
#' @return data.frame, the tour
#' @export
generate_random_tour <- function(nrows) {
x <- c(runif(nrows, min = 0, max = 100))
y <- c(runif(nrows, min = 0, max = 100))
loc <- c(floor(runif(nrows, 1000, 9999)))
return(data.frame(x = x, y = y, loc = loc, stringsAsFactors = FALSE))
}
#' Calculate Tour distance
#'
#' Calculates the total Tour distance by choosing the column and then
#' the index of the stops to be adressed.
#'
#' @param tour data.frame, the tour
#'
#' @return numeric, the distance
#' @export
calc_tour_dist_optimized_filtering <- function(tour) {
assertthat::assert_that(nrow(tour) > 1)
len <- nrow(tour)
dist <- 0.0
for (i in 2:len) {
dist <- dist + sqrt((tour$x[i-1]-tour$x[i])^2 + (tour$y[i-1]-tour$y[i])^2)
}
return (dist)
}
#' Calculate Tour distance
#'
#' Calculates the total Tour distance by first selecting the row followed by
#' the column of the stops to be adressed.
#'
#' @param tour data.frame, the tour
#'
#' @return numeric, the distance
#' @export
calc_tour_dist <- function(tour) {
assertthat::assert_that(nrow(tour) > 1)
len <- nrow(tour)
dist <- 0.0
for (i in 2:len) {
dist <- dist + sqrt((tour[i-1,]$x-tour[i,]$x)^2 + (tour[i-1,]$y-tour[i,]$y)^2)
}
return (dist)
}
test <- function(nrows, nstops) {
for (i in 1:nrows) {
tour <- generate_random_tour(nstops)
calc_tour_dist(tour)
}
}
test_optimized_filtering <- function(nrows, nstops) {
for (i in 1:nrows) {
tour <- generate_random_tour(nstops)
calc_tour_dist_optimized_filtering(tour)
}
}
run_test_std_vs_optimized_filtering <- function(nrows, nstops) {
a <- system.time(test(nrows, nstops))
cat("standard: ", a[1], "\n")
a <- system.time(test_optimized_filtering(nrows, nstops))
cat("optimized: ", a[1])
}
@phabee
Copy link
Author

phabee commented Jul 18, 2018

In the example above we demonstrate the speedup that can be attained by accessing fields in a data.frame in an optimized manner. Depending on the parameters chosen for the test a Speedup of 1 - 6 can be achieved, when using the advantageous filtering-precedence. Compare the two implementations calc_tour_dist and calc_tour_dist_optimized_filtering. They only differ in the way they're accessing the data. When we first select the column and then the row, R performs much faster, than in the base-Version, where we choose a given row and then select the attribute of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment