Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@Sleepingwell
Created March 18, 2014 01:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Sleepingwell/9612041 to your computer and use it in GitHub Desktop.
Save Sleepingwell/9612041 to your computer and use it in GitHub Desktop.
Timings of data.table Vs by for a specific problem
library(data.table)
nl <- 100
stp <- 50000
ss <- stp
ss <- seq(stp, length.out=20, by=stp)
times <- sapply(ss, function(ss) {
mydata <- data.frame(
to_location_id = as.factor(sample(nl, ss, T)),
from_location_id = as.factor(sample(nl, ss, T)),
gender = sample(c('m', 'f'), ss, T),
age = sample(70, ss, T)
)
ptm <- proc.time()
dt <- data.table(mydata)
groupbydata <- dt[,list(count=.N),by='to_location_id,from_location_id']
top5 <- list()
for(toid in unique(mydata$to_location_id)){
top5[[toid]] <- groupbydata[groupbydata$to_location_id==toid,][order(todata$count,decreasing = TRUE),][1:5,]
}
t1 <- proc.time() - ptm
ptm <- proc.time()
top.departures <- by(mydata, mydata$to_location_id, function(x) (tmp<-sort(table(x$from_location_id), dec=T))[1:5])#[1:min(length(tmp), 5)])
t2 <- proc.time() - ptm
c(t1[1:3], t2[1:3])
})
plot(ss, 1:length(ss), ylim=range(times), type='n', xlab='number of obs', ylab='time (seconds)')
cols <- rainbow(6)
mapply(function(y, col) lines(ss, y, col=col), as.data.frame(t(times)), cols)
legend("topleft", c('dt (user)', 'dt (sys)', 'dt (elapsed)', 'by (user)', 'by (sys)', 'by (elapsed)'), lty=1, col=cols)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment