Instantly share code, notes, and snippets.

# phabee/r_perf_test_4.R

Last active July 18, 2018 09:24
Compare Column-Level vs. Row-Level Filtering in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 #' Generate random Tour #' #' Generates a random Tour with nrows number of stops with x/y coordinates and #' a 4-digit random ZIP-code. #' #' @param nrows numeric, the number of stops #' #' @return data.frame, the tour #' @export generate_random_tour <- function(nrows) { x <- c(runif(nrows, min = 0, max = 100)) y <- c(runif(nrows, min = 0, max = 100)) loc <- c(floor(runif(nrows, 1000, 9999))) return(data.frame(x = x, y = y, loc = loc, stringsAsFactors = FALSE)) } #' Calculate Tour distance #' #' Calculates the total Tour distance by choosing the column and then #' the index of the stops to be adressed. #' #' @param tour data.frame, the tour #' #' @return numeric, the distance #' @export calc_tour_dist_optimized_filtering <- function(tour) { assertthat::assert_that(nrow(tour) > 1) len <- nrow(tour) dist <- 0.0 for (i in 2:len) { dist <- dist + sqrt((tour\$x[i-1]-tour\$x[i])^2 + (tour\$y[i-1]-tour\$y[i])^2) } return (dist) } #' Calculate Tour distance #' #' Calculates the total Tour distance by first selecting the row followed by #' the column of the stops to be adressed. #' #' @param tour data.frame, the tour #' #' @return numeric, the distance #' @export calc_tour_dist <- function(tour) { assertthat::assert_that(nrow(tour) > 1) len <- nrow(tour) dist <- 0.0 for (i in 2:len) { dist <- dist + sqrt((tour[i-1,]\$x-tour[i,]\$x)^2 + (tour[i-1,]\$y-tour[i,]\$y)^2) } return (dist) } test <- function(nrows, nstops) { for (i in 1:nrows) { tour <- generate_random_tour(nstops) calc_tour_dist(tour) } } test_optimized_filtering <- function(nrows, nstops) { for (i in 1:nrows) { tour <- generate_random_tour(nstops) calc_tour_dist_optimized_filtering(tour) } } run_test_std_vs_optimized_filtering <- function(nrows, nstops) { a <- system.time(test(nrows, nstops)) cat("standard: ", a[1], "\n") a <- system.time(test_optimized_filtering(nrows, nstops)) cat("optimized: ", a[1]) }

### phabee commented Jul 18, 2018

In the example above we demonstrate the speedup that can be attained by accessing fields in a data.frame in an optimized manner. Depending on the parameters chosen for the test a Speedup of 1 - 6 can be achieved, when using the advantageous filtering-precedence. Compare the two implementations calc_tour_dist and calc_tour_dist_optimized_filtering. They only differ in the way they're accessing the data. When we first select the column and then the row, R performs much faster, than in the base-Version, where we choose a given row and then select the attribute of interest.