Skip to content

Instantly share code, notes, and snippets.

@rinze
Last active December 21, 2015 15:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rinze/6329803 to your computer and use it in GitHub Desktop.
Save rinze/6329803 to your computer and use it in GitHub Desktop.
Difference in timing for vectorized simple operations between an R matrix and a data.frame.
# Timing measurement in R. Vectorized operation on matrix / data.frame
# Author: José María Mateos - jmmateos@ieee.org
#
# For certain vectorized operations, it makes sense to convert your data.frame
# into a matrix. Even if you are using apply, the data frame iteration can be
# real slow.
#
# In this example, I will compute the Euclidean distance for a random vector
# and a random matrix / data.frame of thousands of elements. Operations will be
# done in two different ways: using the apply function over the columns and
# directly using vectorized operations. The intuition says that the latter
# should be faster, but as this example shows, this is not always the case.
#### GENERATE DATA ####
mm <- replicate(15000, rnorm(50))
md <- as.data.frame(mm)
# The vector whose distance to each other I want to measure
v <- rnorm(50)
#### MATRIX OPERATIONS ####
cat("Matrix time:\n")
# Method 1: apply
distance <- function(a, b) sqrt(sum((a - b)^2))
cat(" * With apply\n")
print(system.time(d1 <- apply(mm, 2, function(x) distance(x, v))))
# Method 2: direct vectorized operations
cat(" * Vectorized operation\n")
print(system.time(d2 <- sqrt(colSums((mm - v)^2))))
cat(paste("Are the two results identical?:", identical(d1, d2), "\n"))
#### DATA FRAME OPERATIONS ####
cat("data.frame time:\n")
# Method 1: apply
cat(" * With apply\n")
print(system.time(d1 <- apply(md, 2, function(x) distance(x, v))))
# Method 2: direct vectorized operations
cat(" * Vectorized operation\n")
print(system.time(d2 <- sqrt(colSums((md - v)^2))))
cat(paste("Are the two results identical?:", identical(d1, d2), "\n"))
@rinze
Copy link
Author

rinze commented Aug 24, 2013

My own results, in case anyone wants to check this but is too lazy to run the code:

Matrix time:
 * With apply
   user  system elapsed 
    0.2     0.0     0.2 
 * Vectorized operation
   user  system elapsed 
  0.004   0.008   0.014 
Are the two results identical?: TRUE 
data.frame time:
 * With apply
   user  system elapsed 
  0.504   0.000   0.503 
 * Vectorized operation
   user  system elapsed 
  6.008   0.036   6.055 
Are the two results identical?: TRUE 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment