Skip to content

Instantly share code, notes, and snippets.

@szilard
Last active November 29, 2016 01:06
Show Gist options
  • Save szilard/b632ca400b40661d18f09ab8ed01f79e to your computer and use it in GitHub Desktop.
Save szilard/b632ca400b40661d18f09ab8ed01f79e to your computer and use it in GitHub Desktop.
data.table 20GB x 3GB join
library(data.table)
n <- 2e9
m <- 1e9
system.time( d <- data.table(x = sample(m, n, replace=TRUE), y = runif(n)) )
# user system elapsed
# 103.843 8.255 112.242
system.time( dm <- data.table(x = sample(m)) )
# user system elapsed
# 47.298 1.860 49.288
tables()
# NAME NROW NCOL MB COLS KEY
#[1,] d 2,000,000,000 2 22,889 x,y
#[2,] dm 1,000,000,000 1 3,815 x
#Total: 26,704MB
system.time( setkey(d, x) )
# user system elapsed
# 184.881 4.506 189.789
system.time( print(nrow(d[dm, nomatch=0])) )
# user system elapsed
# 349.744 11.486 361.933
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment